Message ID | 125695ce92218ca2ddb9868880db542acb0d2a79.1593457018.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Reftable support git-core | expand |
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Han-Wen Nienhuys <hanwen@google.com> > > Reftable precisely reproduces the given message. This leads to differences, > because the files backend implicitly adds a trailing '\n' to all messages. What does this mean? With the files backend we'll now see a redundant two LFs in a row? I think you are careful enough not to introduce unnecessary compatibility breakage like that, so perhaps "implicitly adds" is a wrong way to characterize what happens in the files backend, and it only adds LF when the message does not end with one, but does not add an extra one if not necessary? If so, then the change in the patch does not break compatibility, but the above description does not give readers confidence that it is indeed the case. IOW it is unclear how this change manages to avoid breaking existing code. Sorry, but I am left puzzled. > > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> > --- > builtin/checkout.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/builtin/checkout.c b/builtin/checkout.c > index af849c644f..bb11fcc4e9 100644 > --- a/builtin/checkout.c > +++ b/builtin/checkout.c > @@ -884,8 +884,9 @@ static void update_refs_for_switch(const struct checkout_opts *opts, > > reflog_msg = getenv("GIT_REFLOG_ACTION"); > if (!reflog_msg) > - strbuf_addf(&msg, "checkout: moving from %s to %s", > - old_desc ? old_desc : "(invalid)", new_branch_info->name); > + strbuf_addf(&msg, "checkout: moving from %s to %s\n", > + old_desc ? old_desc : "(invalid)", > + new_branch_info->name); > else > strbuf_insertstr(&msg, 0, reflog_msg);
On Mon, Jun 29, 2020 at 10:08 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Han-Wen Nienhuys <hanwen@google.com> > > > > Reftable precisely reproduces the given message. This leads to differences, > > because the files backend implicitly adds a trailing '\n' to all messages. > > What does this mean? With the files backend we'll now see a > redundant two LFs in a row? I think you are careful enough not to > introduce unnecessary compatibility breakage like that, so perhaps > "implicitly adds" is a wrong way to characterize what happens in the > files backend, and it only adds LF when the message does not end > with one, but does not add an extra one if not necessary? > > If so, then the change in the patch does not break compatibility, > but the above description does not give readers confidence that it > is indeed the case. IOW it is unclear how this change manages to > avoid breaking existing code. > > Sorry, but I am left puzzled. Most places that write a reflog message use a message that ends with '\n'. Some places (the one mentioned here) do not append a '\n'. When it is read back, the message does have a '\n', leading to discrepancies with reftable (which faithfully reproduced the message without '\n') I initially thought we could or should fix this across git-core, but I think it will be a lot of work to find all occurrences, so I implemented https://github.com/google/reftable/commit/785f745daf567aeabe47cb01024ea879b9e15c2f which does extra validation on reflog messages, and normalizes the '\n' at the end. Maybe this is also worth discussing in the reftable spec: in reftable, you can use multi-line reflog messages, but that seems fundamentally impossible in the traditional file storage. I can drop this commit from the series, but I think it would be a useful cleanup anyways. -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Han-Wen Nienhuys <hanwen@google.com> writes: > Most places that write a reflog message use a message that ends with '\n'. > > Some places (the one mentioned here) do not append a '\n'. Just to make sure I am following, I see if (update_ref("initial pull", "HEAD", merge_head, ... in bultin/pull.c; this is supposed to be written with "\n" under the new world order? As reflog messages are supposed to be a single liner (due to the nature of the original storage format), I have a feeling that it is cumbersome to the programmers to requre "\n" at the end. It would not add much value to the readability of the resulting code, either. If the places that have hardcoded messages like the above have to be written like so: if (update_ref("initial pull\n", "HEAD", merge_head, ... it makes the code worse. Let's see if I can come up with a coherent "rule" that we can follow to improve the current situation that is an undisciplined mixture as you found out. ... goes and looks the current code ... So, what happens is that update->msg in each ref transaction (in refs.c::ref_transaction_add_update()) gets verbatim copy of what the caller gives the refs API, but when it is written out as an reflog entry, the message is cleansed by calling refs.c::copy_reflog_msg() that squashes a run of whitespaces (not just SP but including LF and HT) into a single SP and then trims whitespaces at the end. Then to the result, a LF is unconditionally added. I would consider the massaging of the application-supplied message done in copy_reflog_msg() as normalization, which turns possibly invalid data (like a message that has LF anywhere in it) into an acceptable form. Which defines the kosher form of a reflog message as not having any LF (or excess SP) in it, and the terminating LF is not part of the message. From that point of view, if we were to do something, I would rather standardize on not having the final (and only) LF. But some callers that prepare a message in a string buffer may find it easier not having to strip the trailing LF, or even depend on the fact that they can feed a string split into multiple lines and the result becomes a single line. So I do not see a point in "fixing" the existing callers that gives an extra LF at the end or other not-so-kosher form of data---these will be normalized by calling copy_reflog_msg() anyway. If they are easy to fix, like sending a constant string with LF at the end, we would want to fix them to improve the readability of the resulting code, though. If the update_ref() call in builtin/pull.c were sending "initial pull\n" as its first parameter, it would be worth fixing by removing the LF, as the result would be easier to read and it is not too much work. On the other hand, if the message that eventually becomes a reflog entry comes from outside and if the existing code relies on the sanitizing done by copy_reflog_msg(), I do not see a point in carefully removing the LF at the end on the caller's side, either. > I initially thought we could or should fix this across git-core, but I > think it will be a lot of work to find all occurrences. I think we are on the same page, even if the definition of which is correct might be different between us. As long as all the refs backend does the same sanitizing (which is probably easy---just call copy_reflog_msg() from them, or perhaps we may want to move the call to where update->msg are queued for each step in a transaction, which would relieve individual backends from having to worry about this issue at all), they would all behave the same for the same reflog data. Thanks.
On Wed, Jul 1, 2020 at 1:58 AM Junio C Hamano <gitster@pobox.com> wrote: > > I initially thought we could or should fix this across git-core, but I > > think it will be a lot of work to find all occurrences. > > I think we are on the same page, even if the definition of which is > correct might be different between us. As long as all the refs Right. The practical upshot of this is that I should drop this patch, though.
diff --git a/builtin/checkout.c b/builtin/checkout.c index af849c644f..bb11fcc4e9 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -884,8 +884,9 @@ static void update_refs_for_switch(const struct checkout_opts *opts, reflog_msg = getenv("GIT_REFLOG_ACTION"); if (!reflog_msg) - strbuf_addf(&msg, "checkout: moving from %s to %s", - old_desc ? old_desc : "(invalid)", new_branch_info->name); + strbuf_addf(&msg, "checkout: moving from %s to %s\n", + old_desc ? old_desc : "(invalid)", + new_branch_info->name); else strbuf_insertstr(&msg, 0, reflog_msg);