diff mbox series

[v19,03/20] checkout: add '\n' to reflog message

Message ID 125695ce92218ca2ddb9868880db542acb0d2a79.1593457018.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Reftable support git-core | expand

Commit Message

Johannes Schindelin via GitGitGadget June 29, 2020, 6:56 p.m. UTC
From: Han-Wen Nienhuys <hanwen@google.com>

Reftable precisely reproduces the given message. This leads to differences,
because the files backend implicitly adds a trailing '\n' to all messages.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/checkout.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Junio C Hamano June 29, 2020, 8:07 p.m. UTC | #1
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> Reftable precisely reproduces the given message. This leads to differences,
> because the files backend implicitly adds a trailing '\n' to all messages.

What does this mean?  With the files backend we'll now see a
redundant two LFs in a row?  I think you are careful enough not to
introduce unnecessary compatibility breakage like that, so perhaps
"implicitly adds" is a wrong way to characterize what happens in the
files backend, and it only adds LF when the message does not end
with one, but does not add an extra one if not necessary?  

If so, then the change in the patch does not break compatibility,
but the above description does not give readers confidence that it
is indeed the case.  IOW it is unclear how this change manages to
avoid breaking existing code.

Sorry, but I am left puzzled.

>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  builtin/checkout.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index af849c644f..bb11fcc4e9 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -884,8 +884,9 @@ static void update_refs_for_switch(const struct checkout_opts *opts,
>  
>  	reflog_msg = getenv("GIT_REFLOG_ACTION");
>  	if (!reflog_msg)
> -		strbuf_addf(&msg, "checkout: moving from %s to %s",
> -			old_desc ? old_desc : "(invalid)", new_branch_info->name);
> +		strbuf_addf(&msg, "checkout: moving from %s to %s\n",
> +			    old_desc ? old_desc : "(invalid)",
> +			    new_branch_info->name);
>  	else
>  		strbuf_insertstr(&msg, 0, reflog_msg);
Han-Wen Nienhuys June 30, 2020, 8:30 a.m. UTC | #2
On Mon, Jun 29, 2020 at 10:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Han-Wen Nienhuys <hanwen@google.com>
> >
> > Reftable precisely reproduces the given message. This leads to differences,
> > because the files backend implicitly adds a trailing '\n' to all messages.
>
> What does this mean?  With the files backend we'll now see a
> redundant two LFs in a row?  I think you are careful enough not to
> introduce unnecessary compatibility breakage like that, so perhaps
> "implicitly adds" is a wrong way to characterize what happens in the
> files backend, and it only adds LF when the message does not end
> with one, but does not add an extra one if not necessary?
>
> If so, then the change in the patch does not break compatibility,
> but the above description does not give readers confidence that it
> is indeed the case.  IOW it is unclear how this change manages to
> avoid breaking existing code.
>
> Sorry, but I am left puzzled.

Most places that write a reflog message use a message that ends with '\n'.

Some places (the one mentioned here) do not append a '\n'. When it is
read back, the message does have a '\n', leading to discrepancies with
reftable (which faithfully reproduced the message without '\n')

I initially thought we could or should fix this across git-core, but I
think it will be a lot of work to find all occurrences, so I
implemented

  https://github.com/google/reftable/commit/785f745daf567aeabe47cb01024ea879b9e15c2f

which does extra validation on reflog messages, and normalizes the
'\n' at the end. Maybe this is also worth discussing in the reftable
spec: in reftable, you can use multi-line reflog messages, but that
seems fundamentally impossible in the traditional file storage.

I can drop this commit from the series, but I think it would be a
useful cleanup anyways.

--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Junio C Hamano June 30, 2020, 11:58 p.m. UTC | #3
Han-Wen Nienhuys <hanwen@google.com> writes:

> Most places that write a reflog message use a message that ends with '\n'.
>
> Some places (the one mentioned here) do not append a '\n'.

Just to make sure I am following, I see

	if (update_ref("initial pull", "HEAD", merge_head, ...

in bultin/pull.c; this is supposed to be written with "\n" under the
new world order?

As reflog messages are supposed to be a single liner (due to the
nature of the original storage format), I have a feeling that it is
cumbersome to the programmers to requre "\n" at the end.  It would
not add much value to the readability of the resulting code, either.

If the places that have hardcoded messages like the above have to be
written like so:

	if (update_ref("initial pull\n", "HEAD", merge_head, ...

it makes the code worse.

Let's see if I can come up with a coherent "rule" that we can follow
to improve the current situation that is an undisciplined mixture as
you found out.

    ... goes and looks the current code ...

So, what happens is that update->msg in each ref transaction (in
refs.c::ref_transaction_add_update()) gets verbatim copy of what the
caller gives the refs API, but when it is written out as an reflog
entry, the message is cleansed by calling refs.c::copy_reflog_msg()
that squashes a run of whitespaces (not just SP but including LF and
HT) into a single SP and then trims whitespaces at the end.  

Then to the result, a LF is unconditionally added.

I would consider the massaging of the application-supplied message
done in copy_reflog_msg() as normalization, which turns possibly
invalid data (like a message that has LF anywhere in it) into an
acceptable form.  Which defines the kosher form of a reflog message
as not having any LF (or excess SP) in it, and the terminating LF is
not part of the message.  From that point of view, if we were to do
something, I would rather standardize on not having the final (and
only) LF.

But some callers that prepare a message in a string buffer may find
it easier not having to strip the trailing LF, or even depend on the
fact that they can feed a string split into multiple lines and the
result becomes a single line.  So I do not see a point in "fixing"
the existing callers that gives an extra LF at the end or other
not-so-kosher form of data---these will be normalized by calling
copy_reflog_msg() anyway.  

If they are easy to fix, like sending a constant string with LF at
the end, we would want to fix them to improve the readability of the
resulting code, though.  If the update_ref() call in builtin/pull.c
were sending "initial pull\n" as its first parameter, it would be
worth fixing by removing the LF, as the result would be easier to
read and it is not too much work.  On the other hand, if the message
that eventually becomes a reflog entry comes from outside and if the
existing code relies on the sanitizing done by copy_reflog_msg(), I
do not see a point in carefully removing the LF at the end on the
caller's side, either.

> I initially thought we could or should fix this across git-core, but I
> think it will be a lot of work to find all occurrences.

I think we are on the same page, even if the definition of which is
correct might be different between us.  As long as all the refs
backend does the same sanitizing (which is probably easy---just call
copy_reflog_msg() from them, or perhaps we may want to move the call
to where update->msg are queued for each step in a transaction,
which would relieve individual backends from having to worry about
this issue at all), they would all behave the same for the same
reflog data.

Thanks.
Han-Wen Nienhuys July 1, 2020, 4:56 p.m. UTC | #4
On Wed, Jul 1, 2020 at 1:58 AM Junio C Hamano <gitster@pobox.com> wrote:
> > I initially thought we could or should fix this across git-core, but I
> > think it will be a lot of work to find all occurrences.
>
> I think we are on the same page, even if the definition of which is
> correct might be different between us.  As long as all the refs

Right. The practical upshot of this is that I should drop this patch, though.
diff mbox series

Patch

diff --git a/builtin/checkout.c b/builtin/checkout.c
index af849c644f..bb11fcc4e9 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -884,8 +884,9 @@  static void update_refs_for_switch(const struct checkout_opts *opts,
 
 	reflog_msg = getenv("GIT_REFLOG_ACTION");
 	if (!reflog_msg)
-		strbuf_addf(&msg, "checkout: moving from %s to %s",
-			old_desc ? old_desc : "(invalid)", new_branch_info->name);
+		strbuf_addf(&msg, "checkout: moving from %s to %s\n",
+			    old_desc ? old_desc : "(invalid)",
+			    new_branch_info->name);
 	else
 		strbuf_insertstr(&msg, 0, reflog_msg);