Re* [PATCH v19 03/20] checkout: add '\n' to reflog message
diff mbox series

Message ID xmqqr1tvc7el.fsf_-_@gitster.c.googlers.com
State New
Headers show
Series
  • Re* [PATCH v19 03/20] checkout: add '\n' to reflog message
Related show

Commit Message

Junio C Hamano July 1, 2020, 8:22 p.m. UTC
Han-Wen Nienhuys <hanwenn@gmail.com> writes:

> On Wed, Jul 1, 2020 at 1:58 AM Junio C Hamano <gitster@pobox.com> wrote:
>> > I initially thought we could or should fix this across git-core, but I
>> > think it will be a lot of work to find all occurrences.
>>
>> I think we are on the same page, even if the definition of which is
>> correct might be different between us.  As long as all the refs
>
> Right. The practical upshot of this is that I should drop this patch, though.

I guess we are on the same page on that one, too, even if our
definition of which side is up and which side is down may be
different ;-)

I think this can be dropped, as it does not even try to enforce "all
reflog message arguments must end with LF".

At the same time, we would need to make sure that reftable backend
cleanses the reflog message the callers of refs API supplies the
same way as the files backend.  In short, if a series of Git
operations that accumulate records in a reflog are performed with
either the files or the reftable backend, "git log -g" and other
forms to retrieve these messages should produce identical results
from both backends.

What I observed during this review was:

 - We expect that a reflog message consists of a single line.  The
   file format used by the files backend may add a LF after the
   message as a delimiter, and output by commands like "git log -g"
   may complete such an incomplete line by adding a LF at the end,
   but philosophically, the terminating LF is not a part of the
   message.

 - We however allow callers of refs API to supply a random sequence
   of NUL terminated bytes.  We cleanse caller-supplied message by
   squasing a run of whitespaces into a SP, and by trimming trailing
   whitespace, before storing the message.  This is how we tolerate,
   instead of erring out, a message with LF in it (be it at the end,
   in the middle, or both).

Currently, the cleansing of the reflog message is done by the files
backend, before the log is written out.  This is sufficient with the
current code, as that is the only backend that writes reflogs.  But
new backends can be added that write reflogs, and we'd want the
resulting log message we would read out of "log -g" the same no
matter what backend is used.

For that, we could force each new ref backend implementation to call
the same cleansing helper, but instead why not do the cleansing at
the backend agnostic way in refs.c layer?  That would be less error
prone.

An added benefit is that the "cleansing" function could be updated
later, independent from individual backends, to e.g. allow
multi-line log messages if we wanted to, and when that happens, it
would help a lot to ensure we covered all bases if the cleansing
function (which would be updated) is called from the generic layer.

	Side note: I am not interested in supporting multi-line
	reflog messages right at the moment (nobody is asking for
	it), but I envision that instead of the "squash a run of
	whitespaces into a SP and rtrim" cleansing, we can
	%urlencode problematic bytes in the message *AND* append a
	SP at the end, when a new version of Git that supports
	multi-line and/or verbatim reflog messages writes a reflog
	record.  The reading side can detect the presense of SP at
	the end (which should have been rtrimmed out if it were
	written by existing versions of Git) as a signal that
	decoding %urlencode recovers the original reflog message.

In any case, a patch that moves the existing "squash SPs and rtrim"
cleansing from the files backend to the generic layer may look like
the attached patch.  We can add reftable backend on top of a change
like this one and then we do not have to worry about each backend
cleansing the incoming reflog messages the same way.  Nice?

Thanks.

 refs.c               | 50 +++++++++++++++++++++++++++++++++++++++++---------
 refs/files-backend.c |  2 +-
 refs/refs-internal.h |  6 ------
 3 files changed, 42 insertions(+), 16 deletions(-)

Comments

Han-Wen Nienhuys July 6, 2020, 3:56 p.m. UTC | #1
On Wed, Jul 1, 2020 at 10:22 PM Junio C Hamano <gitster@pobox.com> wrote:
> In any case, a patch that moves the existing "squash SPs and rtrim"
> cleansing from the files backend to the generic layer may look like
> the attached patch.  We can add reftable backend on top of a change
> like this one and then we do not have to worry about each backend
> cleansing the incoming reflog messages the same way.  Nice?

Yes, very nice!  Will you merge this, or should I make this part of
the reftable series?
The reftable code already has normalization for reflog messages, so it
doesn't really make a difference; either way is fine.
Junio C Hamano July 6, 2020, 6:53 p.m. UTC | #2
Han-Wen Nienhuys <hanwen@google.com> writes:

> On Wed, Jul 1, 2020 at 10:22 PM Junio C Hamano <gitster@pobox.com> wrote:
>> In any case, a patch that moves the existing "squash SPs and rtrim"
>> cleansing from the files backend to the generic layer may look like
>> the attached patch.  We can add reftable backend on top of a change
>> like this one and then we do not have to worry about each backend
>> cleansing the incoming reflog messages the same way.  Nice?
>
> Yes, very nice!  Will you merge this, or should I make this part of
> the reftable series?
> The reftable code already has normalization for reflog messages, so it
> doesn't really make a difference; either way is fine.

It probably fits well in the "to prepare the existing code to
support any new backend" series you have split out of the reftable
series and sent separately earlier, not even "part of the reftable
series", I think.  With something like that, you may even be able
to drop the custom reflog message munging from reftable proper,
just like the whole point of the patch you are responding was to
drop the custom munging from the files backend.

Thanks.

Patch
diff mbox series

diff --git a/refs.c b/refs.c
index 224ff66c7b..75a0cda68a 100644
--- a/refs.c
+++ b/refs.c
@@ -870,7 +870,7 @@  int delete_ref(const char *msg, const char *refname,
 			       old_oid, flags);
 }
 
-void copy_reflog_msg(struct strbuf *sb, const char *msg)
+static void copy_reflog_msg(struct strbuf *sb, const char *msg)
 {
 	char c;
 	int wasspace = 1;
@@ -887,6 +887,15 @@  void copy_reflog_msg(struct strbuf *sb, const char *msg)
 	strbuf_rtrim(sb);
 }
 
+static char *normalize_reflog_message(const char *msg)
+{
+	struct strbuf sb = STRBUF_INIT;
+
+	if (msg && *msg)
+		copy_reflog_msg(&sb, msg);
+	return strbuf_detach(&sb, NULL);
+}
+
 int should_autocreate_reflog(const char *refname)
 {
 	switch (log_all_ref_updates) {
@@ -1092,7 +1101,7 @@  struct ref_update *ref_transaction_add_update(
 		oidcpy(&update->new_oid, new_oid);
 	if (flags & REF_HAVE_OLD)
 		oidcpy(&update->old_oid, old_oid);
-	update->msg = xstrdup_or_null(msg);
+	update->msg = normalize_reflog_message(msg);
 	return update;
 }
 
@@ -1951,9 +1960,14 @@  int refs_create_symref(struct ref_store *refs,
 		       const char *refs_heads_master,
 		       const char *logmsg)
 {
-	return refs->be->create_symref(refs, ref_target,
-				       refs_heads_master,
-				       logmsg);
+	char *msg;
+	int retval;
+
+	msg = normalize_reflog_message(logmsg);
+	retval = refs->be->create_symref(refs, ref_target,
+					 refs_heads_master, msg);
+	free(msg);
+	return retval;
 }
 
 int create_symref(const char *ref_target, const char *refs_heads_master,
@@ -2268,10 +2282,16 @@  int initial_ref_transaction_commit(struct ref_transaction *transaction,
 	return refs->be->initial_transaction_commit(refs, transaction, err);
 }
 
-int refs_delete_refs(struct ref_store *refs, const char *msg,
+int refs_delete_refs(struct ref_store *refs, const char *logmsg,
 		     struct string_list *refnames, unsigned int flags)
 {
-	return refs->be->delete_refs(refs, msg, refnames, flags);
+	char *msg;
+	int retval;
+
+	msg = normalize_reflog_message(logmsg);
+	retval = refs->be->delete_refs(refs, msg, refnames, flags);
+	free(msg);
+	return retval;
 }
 
 int delete_refs(const char *msg, struct string_list *refnames,
@@ -2283,7 +2303,13 @@  int delete_refs(const char *msg, struct string_list *refnames,
 int refs_rename_ref(struct ref_store *refs, const char *oldref,
 		    const char *newref, const char *logmsg)
 {
-	return refs->be->rename_ref(refs, oldref, newref, logmsg);
+	char *msg;
+	int retval;
+
+	msg = normalize_reflog_message(logmsg);
+	retval = refs->be->rename_ref(refs, oldref, newref, msg);
+	free(msg);
+	return retval;
 }
 
 int rename_ref(const char *oldref, const char *newref, const char *logmsg)
@@ -2294,7 +2320,13 @@  int rename_ref(const char *oldref, const char *newref, const char *logmsg)
 int refs_copy_existing_ref(struct ref_store *refs, const char *oldref,
 		    const char *newref, const char *logmsg)
 {
-	return refs->be->copy_ref(refs, oldref, newref, logmsg);
+	char *msg;
+	int retval;
+
+	msg = normalize_reflog_message(logmsg);
+	retval = refs->be->copy_ref(refs, oldref, newref, msg);
+	free(msg);
+	return retval;
 }
 
 int copy_existing_ref(const char *oldref, const char *newref, const char *logmsg)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6516c7bc8c..e0aba23eb2 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1629,7 +1629,7 @@  static int log_ref_write_fd(int fd, const struct object_id *old_oid,
 
 	strbuf_addf(&sb, "%s %s %s", oid_to_hex(old_oid), oid_to_hex(new_oid), committer);
 	if (msg && *msg)
-		copy_reflog_msg(&sb, msg);
+		strbuf_addstr(&sb, msg);
 	strbuf_addch(&sb, '\n');
 	if (write_in_full(fd, sb.buf, sb.len) < 0)
 		ret = -1;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 4271362d26..357359a0be 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -96,12 +96,6 @@  enum peel_status {
  */
 enum peel_status peel_object(const struct object_id *name, struct object_id *oid);
 
-/*
- * Copy the reflog message msg to sb while cleaning up the whitespaces.
- * Especially, convert LF to space, because reflog file is one line per entry.
- */
-void copy_reflog_msg(struct strbuf *sb, const char *msg);
-
 /**
  * Information needed for a single ref update. Set new_oid to the new
  * value or to null_oid to delete the ref. To check the old value