[v3,0/7] Gets rid of "if reflog exists, append to it regardless of config settings"

Message ID	pull.1067.v3.git.git.1631021808.gitgitgadget@gmail.com (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <pull.1067.v3.git.git.1631021808.gitgitgadget@gmail.com> In-Reply-To: <pull.1067.v2.git.git.1630947142.gitgitgadget@gmail.com> References: <pull.1067.v2.git.git.1630947142.gitgitgadget@gmail.com> From: "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> Date: Tue, 07 Sep 2021 13:36:41 +0000 Subject: [PATCH v3 0/7] Gets rid of "if reflog exists, append to it regardless of config settings" Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Taylor Blau <me@ttaylorr.com>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason <avarab@gmail.com>, Han-Wen Nienhuys <hanwenn@gmail.com> Precedence: bulk
Series	Gets rid of "if reflog exists, append to it regardless of config settings" \| expand [v3,0/7] Gets rid of "if reflog exists, append to it regardless of config settings" [v3,1/7] show-branch: show reflog message [v3,2/7] refs: trim newline from reflog message [v3,3/7] test-ref-store: tweaks to for-each-reflog-ent format [v3,4/7] t1400: use test-helper ref-store to inspect reflog contents [v3,5/7] refs: drop force_create argument of create_reflog API [v3,6/7] RFC: refs: reflog entries aren't written based on reflog existence. [v3,7/7] refs: change log_ref_setup calling convention

John Passaro via GitGitGadget Sept. 7, 2021, 1:36 p.m. UTC

<As discussed in
CAFQ2z_Ps3YxycA+NJ9VKt_PEXb+m83JdNB7ujzWw1fTPKyZ=fg@mail.gmail.com

v3:

 * fix show-branch
 * add some more context to commit messages
 * change calling convention for log_ref_setup; could fold into predecessor
   if needed too.

Han-Wen Nienhuys (7):
  show-branch: show reflog message
  refs: trim newline from reflog message
  test-ref-store: tweaks to for-each-reflog-ent format
  t1400: use test-helper ref-store to inspect reflog contents
  refs: drop force_create argument of create_reflog API
  RFC: refs: reflog entries aren't written based on reflog existence.
  refs: change log_ref_setup calling convention

 builtin/checkout.c             |   2 +-
 builtin/show-branch.c          |   7 +-
 reflog-walk.c                  |   6 +-
 refs.c                         |   9 ++-
 refs.h                         |   4 +-
 refs/debug.c                   |   5 +-
 refs/files-backend.c           | 128 +++++++++++++--------------------
 refs/packed-backend.c          |   3 +-
 refs/refs-internal.h           |   2 +-
 t/helper/test-ref-store.c      |   8 +--
 t/t1400-update-ref.sh          |  21 +++---
 t/t1405-main-ref-store.sh      |   6 +-
 t/t1406-submodule-ref-store.sh |   6 +-
 t/t3202-show-branch.sh         |  15 ++++
 14 files changed, 101 insertions(+), 121 deletions(-)


base-commit: e0a2f5cbc585657e757385ad918f167f519cfb96
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1067%2Fhanwen%2Freflog-touch-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1067/hanwen/reflog-touch-v3
Pull-Request: https://github.com/git/git/pull/1067

Range-diff vs v2:

 -:  ----------- > 1:  e158882812f show-branch: show reflog message
 1:  995d450da42 ! 2:  d16d94164c1 refs: trim newline from reflog message
     @@ Commit message
      
       ## builtin/show-branch.c ##
      @@ builtin/show-branch.c: int cmd_show_branch(int ac, const char **av, const char *prefix)
     - 				show_one_commit(rev[i], 1);
     + 			char *logmsg;
     + 			char *nth_desc;
     + 			const char *msg;
     +-			char *end;
     + 			timestamp_t timestamp;
     + 			int tz;
     + 
     +@@ builtin/show-branch.c: int cmd_show_branch(int ac, const char **av, const char *prefix)
     + 				break;
       			}
     - 			else
     --				puts(reflog_msg[i]);
     -+				puts(reflog_msg[i]); /* XXX - this puts a
     -+							newline. Did we put two
     -+							newlines beforehand? */
       
     - 			if (is_head)
     - 				head_at = i;
     +-			end = strchr(logmsg, '\n');
     +-			if (end)
     +-				*end = '\0';
     +-
     + 			msg = (*logmsg == '\0') ? "(none)" : logmsg;
     + 			reflog_msg[i] = xstrfmt("(%s) %s",
     + 						show_date(timestamp, tz,
      
       ## reflog-walk.c ##
      @@ reflog-walk.c: void get_reflog_message(struct strbuf *sb,
 2:  11b296a55e9 = 3:  e273963216c test-ref-store: tweaks to for-each-reflog-ent format
 3:  9ec09cc64cd = 4:  52093fce57c t1400: use test-helper ref-store to inspect reflog contents
 4:  aa25fd9b7de ! 5:  ce0047028dd refs: drop force_create argument of create_reflog API
     @@ Commit message
          There is only one caller, builtin/checkout.c, and it hardcodes
          force_create=1.
      
     +    This argument was introduced in abd0cd3a301 (refs: new public ref function:
     +    safe_create_reflog, 2015-07-21), which promised to immediately use it in a
     +    follow-on commit, but that never happened.
     +
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
       ## builtin/checkout.c ##
     @@ refs/files-backend.c: error:
       	int fd;
       
      -	if (log_ref_setup(refs, refname, force_create, &fd, err))
     -+	if (log_ref_setup(refs, refname, /*force_create=*/1, &fd, err))
     ++	if (log_ref_setup(refs, refname, 1, &fd, err))
       		return -1;
       
       	if (fd >= 0)
 5:  f6a7c5ad56e ! 6:  7a030cfd3e2 RFC: refs: reflog entries aren't written based on reflog existence.
     @@ Commit message
          The reftable storage backend cannot distinguish between a non-existing
          reflog, and an empty one. Therefore it cannot mimick this functionality.
      
     -    In CAFQ2z_Ps3YxycA+NJ9VKt_PEXb+m83JdNB7ujzWw1fTPKyZ=fg@mail.gmail.com,
     -    we came to the conclusion that this feature is probably a remnant from
     -    the time that reflogs weren't enabled by default, and it does not need
     -    to be kept.
     +    With this feature, it is possible to mark only specific branches as subject to
     +    reflog updates. When introduced, it presumably served as a cheap substitute for
     +    introducing branch.$NAME.logRefUpdate configuration setting.
     +
     +    Reflogs are small and don't impact the runtime of normal operations, so this
     +    flexibility is not very useful. Since it incurs complexity for alternate ref
     +    backends, we remove it.
     +
     +    Further background to this change is in
     +    <CAFQ2z_Ps3YxycA+NJ9VKt_PEXb+m83JdNB7ujzWw1fTPKyZ=fg@mail.gmail.com>.
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
 -:  ----------- > 7:  1124dbad594 refs: change log_ref_setup calling convention

Han-Wen Nienhuys Oct. 5, 2021, 3:54 p.m. UTC | #1

On Tue, Sep 7, 2021 at 3:37 PM Han-Wen Nienhuys via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> <As discussed in
> CAFQ2z_Ps3YxycA+NJ9VKt_PEXb+m83JdNB7ujzWw1fTPKyZ=fg@mail.gmail.com
>
> v3:
>
>  * fix show-branch
>  * add some more context to commit messages
>  * change calling convention for log_ref_setup; could fold into predecessor
>    if needed too.

Hi Junio,

I had the impression that I addressed all outstanding comments (but
not sure). Are you waiting for me to do something before this can go
into 'seen' ?

There is a merge conflict against master, so I'll send a v4 shortly.

Junio C Hamano Oct. 6, 2021, 6:20 p.m. UTC | #2

Han-Wen Nienhuys <hanwen@google.com> writes:

> I had the impression that I addressed all outstanding comments (but
> not sure). Are you waiting for me to do something before this can go
> into 'seen' ?
>
> There is a merge conflict against master, so I'll send a v4 shortly.

Sorry.  I seem to have looked at and commented on the precursor RFC
of this topic, but nobody other than Ævar seems to have commented on
the second iteration and the topic was completely under my radar,
and I do not remember what it was about.

It would be good to have an update for others to see.

Thanks.

Junio C Hamano Oct. 6, 2021, 6:27 p.m. UTC | #3

Junio C Hamano <gitster@pobox.com> writes:

> Han-Wen Nienhuys <hanwen@google.com> writes:
>
>> I had the impression that I addressed all outstanding comments (but
>> not sure). Are you waiting for me to do something before this can go
>> into 'seen' ?
>>
>> There is a merge conflict against master, so I'll send a v4 shortly.
>
> Sorry.  I seem to have looked at and commented on the precursor RFC
> of this topic, but nobody other than Ævar seems to have commented on
> the second iteration and the topic was completely under my radar,
> and I do not remember what it was about.
>
> It would be good to have an update for others to see.
>
> Thanks.

Not really.  I think the comment on the RFC still stands, and I do
not recall seeing a response to the point.

    One potential harm this change will bring to us is what happens to
    people who disable core.logAllRefUpdates manually after using the
    repository for a while.  Their @{4} will point at the same commit no
    matter how many operations are done on the current branch after they
    do so.  I wouldn't mind if "git reflog disable" command is given to
    the users prominently and core.logAllRefUpdates becomes a mere
    implementation detail nobody has to care about---in such a world, we
    could set the configuration and drop the existing reflog records at
    the same time and nobody will be hurt.

    If we keep the current behaviour, what are we harming instead?
    Growth of diskspace usage is an obvious one, but disks are cheaper
    compared to human brainwave cycles cost.

As it is about the basic design of the "feature" (or misfeature), no
matter how improved the internal implementation details get, I am
skeptical how it is solved (other than "immediately when we notice
core.logAllRefUpdates is disabled, remove all the existing reflog
entries to avoid confusion, in addition to stop appending any more
reflog entries", that is).

Junio C Hamano Oct. 7, 2021, 5:38 p.m. UTC | #4

Junio C Hamano <gitster@pobox.com> writes:

> Not really.  I think the comment on the RFC still stands, and I do
> not recall seeing a response to the point.
>
>     One potential harm this change will bring to us is what happens to
>     people who disable core.logAllRefUpdates manually after using the
>     repository for a while.  Their @{4} will point at the same commit no
>     matter how many operations are done on the current branch after they
>     do so.  I wouldn't mind if "git reflog disable" command is given to
>     the users prominently and core.logAllRefUpdates becomes a mere
>     implementation detail nobody has to care about---in such a world, we
>     could set the configuration and drop the existing reflog records at
>     the same time and nobody will be hurt.
>
>     If we keep the current behaviour, what are we harming instead?
>     Growth of diskspace usage is an obvious one, but disks are cheaper
>     compared to human brainwave cycles cost.

IIRC, the only reason why reftable implementation may want to change
the behaviour we have to avoid getting blamed for breaking is
because it cannot implement "a reflog exists, and we need to record
further ref movements by appending to it, no matter what the
configuration says" when the existing reflog is empty, because its
data structure lacks support for expressing "exists but empty".

I think the behaviour change described in the title of this message
can be limited in the scope to hurt users a lot less, and can still
satisfy the goal of helping reftable not getting blamed for
breakage, perhaps by making the behaviour for an empty but existing
reflog unspecified or implementation defined per backend.

That would allow us to avoid situation where a user can say foo@{1}
but it does not refer to the point where the branch's tip pointed
just before the most recent update to it. As an empty but existing
reflog would not give foo@{$n} for any value of $n, there is much
less risk of confusing users if we did not append new entries to it,
I would hope.

Han-Wen Nienhuys Nov. 11, 2021, 11:46 a.m. UTC | #5

On Thu, Oct 7, 2021 at 7:38 PM Junio C Hamano <gitster@pobox.com> wrote:
> > Not really.  I think the comment on the RFC still stands, and I do
> > not recall seeing a response to the point.
> >
> >     One potential harm this change will bring to us is what happens to
> >     people who disable core.logAllRefUpdates manually after using the
> >     repository for a while.  Their @{4} will point at the same commit no
> >     matter how many operations are done on the current branch after they
> >     do so.  I wouldn't mind if "git reflog disable" command is given to
> >     the users prominently and core.logAllRefUpdates becomes a mere
> >     implementation detail nobody has to care about---in such a world, we
> >     could set the configuration and drop the existing reflog records at
> >     the same time and nobody will be hurt.

A git 'reflog disable' command would address your concerns, but it is
a destructive operation, so the cure might be worse than the solution.

> IIRC, the only reason why reftable implementation may want to change
> the behaviour we have to avoid getting blamed for breaking is
> because it cannot implement "a reflog exists, and we need to record
> further ref movements by appending to it, no matter what the
> configuration says" when the existing reflog is empty, because its
> data structure lacks support for expressing "exists but empty".
>
> I think the behaviour change described in the title of this message
> can be limited in the scope to hurt users a lot less, and can still
> satisfy the goal of helping reftable not getting blamed for
> breakage, perhaps by making the behaviour for an empty but existing
> reflog unspecified or implementation defined per backend.

If we accept implementation-dependent features, we could just leave
the whole feature as is. I had expected more breakage, but there is
only one test case in t1400 that needs addressing. If the test
coverage reflects the popularity of the feature, it should be fine to
leave this divergence in, and mark the test with REFFILES.

The commits prior to the RFC should be OK for committing. In
particular, there is a bugfix for the show-branch command. Should I
resend those separately?

Ævar Arnfjörð Bjarmason Nov. 11, 2021, 2:38 p.m. UTC | #6

On Thu, Nov 11 2021, Han-Wen Nienhuys wrote:

> On Thu, Oct 7, 2021 at 7:38 PM Junio C Hamano <gitster@pobox.com> wrote:
>> > Not really.  I think the comment on the RFC still stands, and I do
>> > not recall seeing a response to the point.
>> >
>> >     One potential harm this change will bring to us is what happens to
>> >     people who disable core.logAllRefUpdates manually after using the
>> >     repository for a while.  Their @{4} will point at the same commit no
>> >     matter how many operations are done on the current branch after they
>> >     do so.  I wouldn't mind if "git reflog disable" command is given to
>> >     the users prominently and core.logAllRefUpdates becomes a mere
>> >     implementation detail nobody has to care about---in such a world, we
>> >     could set the configuration and drop the existing reflog records at
>> >     the same time and nobody will be hurt.
>
> A git 'reflog disable' command would address your concerns, but it is
> a destructive operation, so the cure might be worse than the solution.
>
>> IIRC, the only reason why reftable implementation may want to change
>> the behaviour we have to avoid getting blamed for breaking is
>> because it cannot implement "a reflog exists, and we need to record
>> further ref movements by appending to it, no matter what the
>> configuration says" when the existing reflog is empty, because its
>> data structure lacks support for expressing "exists but empty".
>>
>> I think the behaviour change described in the title of this message
>> can be limited in the scope to hurt users a lot less, and can still
>> satisfy the goal of helping reftable not getting blamed for
>> breakage, perhaps by making the behaviour for an empty but existing
>> reflog unspecified or implementation defined per backend.
>
> If we accept implementation-dependent features, we could just leave
> the whole feature as is. I had expected more breakage, but there is
> only one test case in t1400 that needs addressing. If the test
> coverage reflects the popularity of the feature, it should be fine to
> leave this divergence in, and mark the test with REFFILES.
>
> The commits prior to the RFC should be OK for committing. In
> particular, there is a bugfix for the show-branch command. Should I
> resend those separately?

I've got some follow-up patches to what's sitting in "next" already that
hoist some reffiles-specific stuff into builtin/reflog.c, I haven't
tested but I expect that the behavior change is silent now in the
reftable backend, i.e. it doesn't implement progress/verbose the same
way, presumably.

Between that and 5ac15ad2509 (reflog tests: add --updateref tests,
2021-10-16) & 52106430dc8 (refs/files: remove "name exist?" check in
lock_ref_oid_basic(), 2021-10-16) I wouldn't put too much faith in those
reflog tests.

None of that should be a blocker for your series landing, just say'n. I
don't trust those tests.

IMO the only meaningful way to be confident in testing these sorts of
things with reftable is more of the chaos monkey approach of the
GIT_TEST_* modes, i.e. we now have a WIP mode to do that for reftable
that has known breakages.

We could similarly instrument the test suite to do "git reflog expire"
for each ref at the end of tests, a bunch of things would break, but we
could log the complete -V run and see if what breaks is different under
the two backends.

I've got some WIP patches to add a similar chaos mode using "git gc
--auto", and it turned up some interesting stuff. It's what I used
initially to test what's now landed in ae35e16cd43 (reflog expire: don't
lock reflogs using previously seen OID, 2021-08-23).

[v3,0/7] Gets rid of "if reflog exists, append to it regardless of config settings"

Message

Comments