Message ID | 0288e743eb2e96e2effd6b0b90c6f885009bf337.1637855872.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Inspect reflog data programmatically in more tests | expand |
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Han-Wen Nienhuys <hanwen@google.com> > > On iteration, the reflog message is always terminated by a newline. Trim it to > avoid clobbering the console with is this extra newline. > > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> > --- > refs/debug.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/refs/debug.c b/refs/debug.c > index 8667c640237..2631210795b 100644 > --- a/refs/debug.c > +++ b/refs/debug.c > @@ -284,15 +284,21 @@ static int debug_print_reflog_ent(struct object_id *old_oid, > int ret; > char o[GIT_MAX_HEXSZ + 1] = "null"; > char n[GIT_MAX_HEXSZ + 1] = "null"; > + struct strbuf trimmed = STRBUF_INIT; > if (old_oid) > oid_to_hex_r(o, old_oid); > if (new_oid) > oid_to_hex_r(n, new_oid); > > + strbuf_addstr(&trimmed, msg); > ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg, > dbg->cb_data); > - trace_printf_key(&trace_refs, "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n", > - dbg->refname, ret, o, n, committer, (long int)timestamp, msg); > + strbuf_trim_trailing_newline(&trimmed); The API promises to have only LF, not CRLF, at the end, so strbuf_trim_trailing_newline() is a bit overkill (and if payload happened to end with CR, we would lose it). > + trace_printf_key(&trace_refs, > + "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n", > + dbg->refname, ret, o, n, committer, > + (long int)timestamp, trimmed.buf); > + strbuf_release(&trimmed); > return ret; > } Can we use counted bytes in trace_printf()? If we can, it would be simpler to just scan "msg" for LF and then show only the span between the beginning of the string and the found LF using "%.*s", perhaps like this? refs/debug.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git c/refs/debug.c w/refs/debug.c index 8667c64023..c97eeb5740 100644 --- c/refs/debug.c +++ w/refs/debug.c @@ -284,15 +284,20 @@ static int debug_print_reflog_ent(struct object_id *old_oid, int ret; char o[GIT_MAX_HEXSZ + 1] = "null"; char n[GIT_MAX_HEXSZ + 1] = "null"; + int msglen; + if (old_oid) oid_to_hex_r(o, old_oid); if (new_oid) oid_to_hex_r(n, new_oid); + msglen = strchrnul(msg, '\n') - msg; ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg, dbg->cb_data); - trace_printf_key(&trace_refs, "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n", - dbg->refname, ret, o, n, committer, (long int)timestamp, msg); + trace_printf_key(&trace_refs, + "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%.*s\"\n", + dbg->refname, ret, o, n, committer, (long int)timestamp, + (int)msglen, msg); return ret; }
On Fri, Nov 26, 2021 at 9:16 AM Junio C Hamano <gitster@pobox.com> wrote: > > The API promises to have only LF, not CRLF, at the end, so > strbuf_trim_trailing_newline() is a bit overkill (and if payload > happened to end with CR, we would lose it). it would be best if there was a way to escape characters (ie. "\n" => "\\n"). Do we have a function for that? > > + trace_printf_key(&trace_refs, > > + "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n", > > + dbg->refname, ret, o, n, committer, > > + (long int)timestamp, trimmed.buf); > > + strbuf_release(&trimmed); > > return ret; > > } > > Can we use counted bytes in trace_printf()? If we can, it would be > simpler to just scan "msg" for LF and then show only the span > between the beginning of the string and the found LF using "%.*s", > perhaps like this? I beg to differ - despite this being fewer lines of code, I think pointer arithmetic is best avoided if possible.
Han-Wen Nienhuys <hanwen@google.com> writes: > On Fri, Nov 26, 2021 at 9:16 AM Junio C Hamano <gitster@pobox.com> wrote: >> >> The API promises to have only LF, not CRLF, at the end, so >> strbuf_trim_trailing_newline() is a bit overkill (and if payload >> happened to end with CR, we would lose it). > > it would be best if there was a way to escape characters (ie. "\n" => > "\\n"). Do we have a function for that? Mere escaping would not work in a backward compatible way, without a trick. It was envisioned that we probably could encode *and* signal the fact that the message is encoded by appending a trailing SP at the end of the message. See the log message of 523fa69c (reflog: cleanse messages in the refs.c layer, 2020-07-10) for details. Having said that, that is about introducing a whole new reflog message format (whose use is signalled by the trailing SP), and I would prefer it to happen (1) after we integrate with reftable, and (2) implemented as an option in the normalize_reflog_message() function, so that no ref backends has to worry about it. outside this topic. > I beg to differ - despite this being fewer lines of code, I think > pointer arithmetic is best avoided if possible. I think repeated allocation and deallocation is best avoided, and that is why I recommended it. I do not see anything to fear in poiter arithmetic, as long as it is done clearly (e.g. in a narrow scope) and correctly. If trace_printf() does not allow counted bytes "%.*s", the whole discussion is moot; I didn't go back to check.
Junio C Hamano <gitster@pobox.com> writes: > Han-Wen Nienhuys <hanwen@google.com> writes: > >> On Fri, Nov 26, 2021 at 9:16 AM Junio C Hamano <gitster@pobox.com> wrote: >>> >>> The API promises to have only LF, not CRLF, at the end, so >>> strbuf_trim_trailing_newline() is a bit overkill (and if payload >>> happened to end with CR, we would lose it). >> >> it would be best if there was a way to escape characters (ie. "\n" => >> "\\n"). Do we have a function for that? > > Mere escaping would not work in a backward compatible way, without a > trick. It was envisioned that we probably could encode *and* signal > the fact that the message is encoded by appending a trailing SP at > the end of the message. See the log message of 523fa69c (reflog: > cleanse messages in the refs.c layer, 2020-07-10) for details. > > Having said that, that is about introducing a whole new reflog > message format (whose use is signalled by the trailing SP), and I > would prefer it to happen > > (1) after we integrate with reftable, and > > (2) implemented as an option in the normalize_reflog_message() > function, so that no ref backends has to worry about it. > > outside this topic. Sorry, I seem to have forgotten to give an answer to your question, and having warned sufficiently clearly that now is not the time to do it, I can safely do so. If we wanted to go the C-quote route, then the function we would use would be quote.c::quote_c_style(). Even though 523fa69c (reflog: cleanse messages in the refs.c layer, 2020-07-10) muses on use of urlencode, it shouldn't be taken as a sign that I prefer it over c-quoting, or vice versa for that matter. It was merely an example encoding picked in the context of explaining that (1) I wasn't interested in keeping a multi-line message and spewing back as-is, and (2) but it is possible to loosen it in the future.
On Mon, Nov 29 2021, Han-Wen Nienhuys wrote: > On Fri, Nov 26, 2021 at 9:16 AM Junio C Hamano <gitster@pobox.com> wrote: >> >> The API promises to have only LF, not CRLF, at the end, so >> strbuf_trim_trailing_newline() is a bit overkill (and if payload >> happened to end with CR, we would lose it). > > it would be best if there was a way to escape characters (ie. "\n" => > "\\n"). Do we have a function for that? > >> > + trace_printf_key(&trace_refs, >> > + "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n", >> > + dbg->refname, ret, o, n, committer, >> > + (long int)timestamp, trimmed.buf); >> > + strbuf_release(&trimmed); >> > return ret; >> > } >> >> Can we use counted bytes in trace_printf()? If we can, it would be >> simpler to just scan "msg" for LF and then show only the span >> between the beginning of the string and the found LF using "%.*s", >> perhaps like this? > > I beg to differ - despite this being fewer lines of code, I think > pointer arithmetic is best avoided if possible. We usually do this with pointer arithmetic, but the %.*s format doesn't require that, just code like: const char *str = "foobar"; size_t len = strlen(str); len -= 1; /* give me less! */ printf("%.*s", (int)len, str); So you can also feed it (len - 1) or whatever if you know it to end with a character you don't want. It's (more simply done as) pointer arithmetic if you're finding that end marker with strstr() or whatever, but you can also bend over backwards and get a "len" instead through other means, and in either case I think it beats reallocating the whole thing (more for readability than any optimization reasons).
On Mon, Nov 29, 2021 at 8:20 PM Junio C Hamano <gitster@pobox.com> wrote: > > Han-Wen Nienhuys <hanwen@google.com> writes: > > > On Fri, Nov 26, 2021 at 9:16 AM Junio C Hamano <gitster@pobox.com> wrote: > >> > >> The API promises to have only LF, not CRLF, at the end, so > >> strbuf_trim_trailing_newline() is a bit overkill (and if payload > >> happened to end with CR, we would lose it). > > > > it would be best if there was a way to escape characters (ie. "\n" => > > "\\n"). Do we have a function for that? > > Mere escaping would not work in a backward compatible way, without a > trick. It was envisioned that we probably could encode *and* signal I'm talking about the debug output, which isn't subject to compatibility guarantees.
Han-Wen Nienhuys <hanwen@google.com> writes: > On Mon, Nov 29, 2021 at 8:20 PM Junio C Hamano <gitster@pobox.com> wrote: >> >> Han-Wen Nienhuys <hanwen@google.com> writes: >> >> > On Fri, Nov 26, 2021 at 9:16 AM Junio C Hamano <gitster@pobox.com> wrote: >> >> >> >> The API promises to have only LF, not CRLF, at the end, so >> >> strbuf_trim_trailing_newline() is a bit overkill (and if payload >> >> happened to end with CR, we would lose it). >> > >> > it would be best if there was a way to escape characters (ie. "\n" => >> > "\\n"). Do we have a function for that? >> >> Mere escaping would not work in a backward compatible way, without a >> trick. It was envisioned that we probably could encode *and* signal > > I'm talking about the debug output, which isn't subject to > compatibility guarantees. There is no need to show "\n" in the debug output in the first place, no? The message part stored in the reflog database is already normalized to have each runs of whitespaces (including CR and LF) turned and squashed into a single SP, so the LF at the end is constant that the debug output can unconditionally strip without losing information. Or are we talking about logging the _input_ to be recorded as the message for a new reflog entry? Then I agree we can do whatever, and if you prefer C-style quoting, the helpers in quote.c are your friends. Thanks.
diff --git a/refs/debug.c b/refs/debug.c index 8667c640237..2631210795b 100644 --- a/refs/debug.c +++ b/refs/debug.c @@ -284,15 +284,21 @@ static int debug_print_reflog_ent(struct object_id *old_oid, int ret; char o[GIT_MAX_HEXSZ + 1] = "null"; char n[GIT_MAX_HEXSZ + 1] = "null"; + struct strbuf trimmed = STRBUF_INIT; if (old_oid) oid_to_hex_r(o, old_oid); if (new_oid) oid_to_hex_r(n, new_oid); + strbuf_addstr(&trimmed, msg); ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg, dbg->cb_data); - trace_printf_key(&trace_refs, "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n", - dbg->refname, ret, o, n, committer, (long int)timestamp, msg); + strbuf_trim_trailing_newline(&trimmed); + trace_printf_key(&trace_refs, + "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n", + dbg->refname, ret, o, n, committer, + (long int)timestamp, trimmed.buf); + strbuf_release(&trimmed); return ret; }