diff mbox series

[v2,5/5] refs/debug: trim trailing LF from reflog message

Message ID 0288e743eb2e96e2effd6b0b90c6f885009bf337.1637855872.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Inspect reflog data programmatically in more tests | expand

Commit Message

Han-Wen Nienhuys Nov. 25, 2021, 3:57 p.m. UTC
From: Han-Wen Nienhuys <hanwen@google.com>

On iteration, the reflog message is always terminated by a newline. Trim it to
avoid clobbering the console with is this extra newline.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/debug.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Junio C Hamano Nov. 26, 2021, 8:16 a.m. UTC | #1
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> On iteration, the reflog message is always terminated by a newline. Trim it to
> avoid clobbering the console with is this extra newline.
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  refs/debug.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/refs/debug.c b/refs/debug.c
> index 8667c640237..2631210795b 100644
> --- a/refs/debug.c
> +++ b/refs/debug.c
> @@ -284,15 +284,21 @@ static int debug_print_reflog_ent(struct object_id *old_oid,
>  	int ret;
>  	char o[GIT_MAX_HEXSZ + 1] = "null";
>  	char n[GIT_MAX_HEXSZ + 1] = "null";
> +	struct strbuf trimmed = STRBUF_INIT;
>  	if (old_oid)
>  		oid_to_hex_r(o, old_oid);
>  	if (new_oid)
>  		oid_to_hex_r(n, new_oid);
>  
> +	strbuf_addstr(&trimmed, msg);
>  	ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg,
>  		      dbg->cb_data);
> -	trace_printf_key(&trace_refs, "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
> -		dbg->refname, ret, o, n, committer, (long int)timestamp, msg);
> +	strbuf_trim_trailing_newline(&trimmed);

The API promises to have only LF, not CRLF, at the end, so
strbuf_trim_trailing_newline() is a bit overkill (and if payload
happened to end with CR, we would lose it).

> +	trace_printf_key(&trace_refs,
> +			 "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
> +			 dbg->refname, ret, o, n, committer,
> +			 (long int)timestamp, trimmed.buf);
> +	strbuf_release(&trimmed);
>  	return ret;
>  }

Can we use counted bytes in trace_printf()?  If we can, it would be
simpler to just scan "msg" for LF and then show only the span
between the beginning of the string and the found LF using "%.*s",
perhaps like this?

 refs/debug.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git c/refs/debug.c w/refs/debug.c
index 8667c64023..c97eeb5740 100644
--- c/refs/debug.c
+++ w/refs/debug.c
@@ -284,15 +284,20 @@ static int debug_print_reflog_ent(struct object_id *old_oid,
 	int ret;
 	char o[GIT_MAX_HEXSZ + 1] = "null";
 	char n[GIT_MAX_HEXSZ + 1] = "null";
+	int msglen;
+
 	if (old_oid)
 		oid_to_hex_r(o, old_oid);
 	if (new_oid)
 		oid_to_hex_r(n, new_oid);
 
+	msglen = strchrnul(msg, '\n') - msg;
 	ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg,
 		      dbg->cb_data);
-	trace_printf_key(&trace_refs, "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
-		dbg->refname, ret, o, n, committer, (long int)timestamp, msg);
+	trace_printf_key(&trace_refs,
+			 "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%.*s\"\n",
+			 dbg->refname, ret, o, n, committer, (long int)timestamp,
+			 (int)msglen, msg);
 	return ret;
 }
Han-Wen Nienhuys Nov. 29, 2021, 6:29 p.m. UTC | #2
On Fri, Nov 26, 2021 at 9:16 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> The API promises to have only LF, not CRLF, at the end, so
> strbuf_trim_trailing_newline() is a bit overkill (and if payload
> happened to end with CR, we would lose it).

it would be best if there was a way to escape characters (ie. "\n" =>
"\\n"). Do we have a function for that?

> > +     trace_printf_key(&trace_refs,
> > +                      "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
> > +                      dbg->refname, ret, o, n, committer,
> > +                      (long int)timestamp, trimmed.buf);
> > +     strbuf_release(&trimmed);
> >       return ret;
> >  }
>
> Can we use counted bytes in trace_printf()?  If we can, it would be
> simpler to just scan "msg" for LF and then show only the span
> between the beginning of the string and the found LF using "%.*s",
> perhaps like this?

I beg to differ - despite this being fewer lines of code, I think
pointer arithmetic is best avoided if possible.
Junio C Hamano Nov. 29, 2021, 7:19 p.m. UTC | #3
Han-Wen Nienhuys <hanwen@google.com> writes:

> On Fri, Nov 26, 2021 at 9:16 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> The API promises to have only LF, not CRLF, at the end, so
>> strbuf_trim_trailing_newline() is a bit overkill (and if payload
>> happened to end with CR, we would lose it).
>
> it would be best if there was a way to escape characters (ie. "\n" =>
> "\\n"). Do we have a function for that?

Mere escaping would not work in a backward compatible way, without a
trick.  It was envisioned that we probably could encode *and* signal
the fact that the message is encoded by appending a trailing SP at
the end of the message.  See the log message of 523fa69c (reflog:
cleanse messages in the refs.c layer, 2020-07-10) for details.

Having said that, that is about introducing a whole new reflog
message format (whose use is signalled by the trailing SP), and I
would prefer it to happen

 (1) after we integrate with reftable, and

 (2) implemented as an option in the normalize_reflog_message()
     function, so that no ref backends has to worry about it.

outside this topic.

> I beg to differ - despite this being fewer lines of code, I think
> pointer arithmetic is best avoided if possible.

I think repeated allocation and deallocation is best avoided, and
that is why I recommended it.  I do not see anything to fear in
poiter arithmetic, as long as it is done clearly (e.g. in a narrow
scope) and correctly.

If trace_printf() does not allow counted bytes "%.*s", the whole
discussion is moot; I didn't go back to check.
Junio C Hamano Nov. 29, 2021, 7:35 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Han-Wen Nienhuys <hanwen@google.com> writes:
>
>> On Fri, Nov 26, 2021 at 9:16 AM Junio C Hamano <gitster@pobox.com> wrote:
>>>
>>> The API promises to have only LF, not CRLF, at the end, so
>>> strbuf_trim_trailing_newline() is a bit overkill (and if payload
>>> happened to end with CR, we would lose it).
>>
>> it would be best if there was a way to escape characters (ie. "\n" =>
>> "\\n"). Do we have a function for that?
>
> Mere escaping would not work in a backward compatible way, without a
> trick.  It was envisioned that we probably could encode *and* signal
> the fact that the message is encoded by appending a trailing SP at
> the end of the message.  See the log message of 523fa69c (reflog:
> cleanse messages in the refs.c layer, 2020-07-10) for details.
>
> Having said that, that is about introducing a whole new reflog
> message format (whose use is signalled by the trailing SP), and I
> would prefer it to happen
>
>  (1) after we integrate with reftable, and
>
>  (2) implemented as an option in the normalize_reflog_message()
>      function, so that no ref backends has to worry about it.
>
> outside this topic.

Sorry, I seem to have forgotten to give an answer to your question,
and having warned sufficiently clearly that now is not the time to
do it, I can safely do so.  If we wanted to go the C-quote route,
then the function we would use would be quote.c::quote_c_style().

Even though 523fa69c (reflog: cleanse messages in the refs.c layer,
2020-07-10) muses on use of urlencode, it shouldn't be taken as a
sign that I prefer it over c-quoting, or vice versa for that matter.

It was merely an example encoding picked in the context of
explaining that (1) I wasn't interested in keeping a multi-line
message and spewing back as-is, and (2) but it is possible to loosen
it in the future.
Ævar Arnfjörð Bjarmason Nov. 29, 2021, 8:59 p.m. UTC | #5
On Mon, Nov 29 2021, Han-Wen Nienhuys wrote:

> On Fri, Nov 26, 2021 at 9:16 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> The API promises to have only LF, not CRLF, at the end, so
>> strbuf_trim_trailing_newline() is a bit overkill (and if payload
>> happened to end with CR, we would lose it).
>
> it would be best if there was a way to escape characters (ie. "\n" =>
> "\\n"). Do we have a function for that?
>
>> > +     trace_printf_key(&trace_refs,
>> > +                      "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
>> > +                      dbg->refname, ret, o, n, committer,
>> > +                      (long int)timestamp, trimmed.buf);
>> > +     strbuf_release(&trimmed);
>> >       return ret;
>> >  }
>>
>> Can we use counted bytes in trace_printf()?  If we can, it would be
>> simpler to just scan "msg" for LF and then show only the span
>> between the beginning of the string and the found LF using "%.*s",
>> perhaps like this?
>
> I beg to differ - despite this being fewer lines of code, I think
> pointer arithmetic is best avoided if possible.

We usually do this with pointer arithmetic, but the %.*s format doesn't
require that, just code like:

    const char *str = "foobar";
    size_t len = strlen(str);
    len -= 1; /* give me less! */
    printf("%.*s", (int)len, str);

So you can also feed it (len - 1) or whatever if you know it to end with
a character you don't want.

It's (more simply done as) pointer arithmetic if you're finding that end
marker with strstr() or whatever, but you can also bend over backwards
and get a "len" instead through other means, and in either case I think
it beats reallocating the whole thing (more for readability than any
optimization reasons).
Han-Wen Nienhuys Dec. 2, 2021, 4:24 p.m. UTC | #6
On Mon, Nov 29, 2021 at 8:20 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Han-Wen Nienhuys <hanwen@google.com> writes:
>
> > On Fri, Nov 26, 2021 at 9:16 AM Junio C Hamano <gitster@pobox.com> wrote:
> >>
> >> The API promises to have only LF, not CRLF, at the end, so
> >> strbuf_trim_trailing_newline() is a bit overkill (and if payload
> >> happened to end with CR, we would lose it).
> >
> > it would be best if there was a way to escape characters (ie. "\n" =>
> > "\\n"). Do we have a function for that?
>
> Mere escaping would not work in a backward compatible way, without a
> trick.  It was envisioned that we probably could encode *and* signal

I'm talking about the debug output, which isn't subject to
compatibility guarantees.
Junio C Hamano Dec. 2, 2021, 6:36 p.m. UTC | #7
Han-Wen Nienhuys <hanwen@google.com> writes:

> On Mon, Nov 29, 2021 at 8:20 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Han-Wen Nienhuys <hanwen@google.com> writes:
>>
>> > On Fri, Nov 26, 2021 at 9:16 AM Junio C Hamano <gitster@pobox.com> wrote:
>> >>
>> >> The API promises to have only LF, not CRLF, at the end, so
>> >> strbuf_trim_trailing_newline() is a bit overkill (and if payload
>> >> happened to end with CR, we would lose it).
>> >
>> > it would be best if there was a way to escape characters (ie. "\n" =>
>> > "\\n"). Do we have a function for that?
>>
>> Mere escaping would not work in a backward compatible way, without a
>> trick.  It was envisioned that we probably could encode *and* signal
>
> I'm talking about the debug output, which isn't subject to
> compatibility guarantees.

There is no need to show "\n" in the debug output in the first
place, no?  The message part stored in the reflog database is
already normalized to have each runs of whitespaces (including CR
and LF) turned and squashed into a single SP, so the LF at the end
is constant that the debug output can unconditionally strip without
losing information.

Or are we talking about logging the _input_ to be recorded as the
message for a new reflog entry?  Then I agree we can do whatever,
and if you prefer C-style quoting, the helpers in quote.c are your
friends.

Thanks.
diff mbox series

Patch

diff --git a/refs/debug.c b/refs/debug.c
index 8667c640237..2631210795b 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -284,15 +284,21 @@  static int debug_print_reflog_ent(struct object_id *old_oid,
 	int ret;
 	char o[GIT_MAX_HEXSZ + 1] = "null";
 	char n[GIT_MAX_HEXSZ + 1] = "null";
+	struct strbuf trimmed = STRBUF_INIT;
 	if (old_oid)
 		oid_to_hex_r(o, old_oid);
 	if (new_oid)
 		oid_to_hex_r(n, new_oid);
 
+	strbuf_addstr(&trimmed, msg);
 	ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg,
 		      dbg->cb_data);
-	trace_printf_key(&trace_refs, "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
-		dbg->refname, ret, o, n, committer, (long int)timestamp, msg);
+	strbuf_trim_trailing_newline(&trimmed);
+	trace_printf_key(&trace_refs,
+			 "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
+			 dbg->refname, ret, o, n, committer,
+			 (long int)timestamp, trimmed.buf);
+	strbuf_release(&trimmed);
 	return ret;
 }