mbox series

[v2,0/5] Inspect reflog data programmatically in more tests

Message ID pull.1145.v2.git.git.1637855872.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Inspect reflog data programmatically in more tests | expand

Message

John Passaro via GitGitGadget Nov. 25, 2021, 3:57 p.m. UTC
This helps for reftable support, and will help if we want to reconsider
under which conditions reflogs get created/updated.

Han-Wen Nienhuys (5):
  show-branch: show reflog message
  test-ref-store: don't add newline to reflog message
  t1405: check for_each_reflog_ent_reverse() more thoroughly
  test-ref-store: tweaks to for-each-reflog-ent format
  refs/debug: trim trailing LF from reflog message

 builtin/show-branch.c          | 12 +++++++-----
 refs/debug.c                   | 10 ++++++++--
 t/helper/test-ref-store.c      |  6 +++---
 t/t1400-update-ref.sh          | 13 ++++++++-----
 t/t1405-main-ref-store.sh      |  4 ++--
 t/t1406-submodule-ref-store.sh |  4 ++--
 t/t3202-show-branch.sh         | 15 +++++++++++++++
 7 files changed, 45 insertions(+), 19 deletions(-)


base-commit: 35151cf0720460a897cde9b8039af364743240e7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1145%2Fhanwen%2Freflog-prelims-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1145/hanwen/reflog-prelims-v2
Pull-Request: https://github.com/git/git/pull/1145

Range-diff vs v1:

 1:  fd2595d370a = 1:  9d8394d8c76 show-branch: show reflog message
 2:  dfb63937323 ! 2:  4a86d212589 refs: trim newline from reflog message
     @@ Metadata
      Author: Han-Wen Nienhuys <hanwen@google.com>
      
       ## Commit message ##
     -    refs: trim newline from reflog message
     +    test-ref-store: don't add newline to reflog message
      
     -    Commit 523fa69c ("reflog: cleanse messages in the refs.c layer") standardizes
     -    how write entries into the reflog. This commit standardizes how we get messages
     -    out of the reflog. Before, the files backend implicitly added '\n' to the end of
     -    reflog message on reading, which creates a subtle incompatibility with alternate
     -    ref storage backends, such as reftable.
     -
     -    We address this by stripping LF from the message before we pass it to the
     -    user-provided callback.
     +    The files backend produces a newline for messages automatically, so before we
     +    would print blank lines between entries.
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
     - ## builtin/show-branch.c ##
     -@@ builtin/show-branch.c: int cmd_show_branch(int ac, const char **av, const char *prefix)
     - 			char *logmsg;
     - 			char *nth_desc;
     - 			const char *msg;
     --			char *end;
     - 			timestamp_t timestamp;
     - 			int tz;
     - 
     -@@ builtin/show-branch.c: int cmd_show_branch(int ac, const char **av, const char *prefix)
     - 				break;
     - 			}
     - 
     --			end = strchr(logmsg, '\n');
     --			if (end)
     --				*end = '\0';
     --
     - 			msg = (*logmsg == '\0') ? "(none)" : logmsg;
     - 			reflog_msg[i] = xstrfmt("(%s) %s",
     - 						show_date(timestamp, tz,
     -
     - ## reflog-walk.c ##
     -@@ reflog-walk.c: void get_reflog_message(struct strbuf *sb,
     - 
     - 	info = &commit_reflog->reflogs->items[commit_reflog->recno+1];
     - 	len = strlen(info->message);
     --	if (len > 0)
     --		len--; /* strip away trailing newline */
     - 	strbuf_add(sb, info->message, len);
     - }
     - 
     -@@ reflog-walk.c: void show_reflog_message(struct reflog_walk_info *reflog_info, int oneline,
     - 		info = &commit_reflog->reflogs->items[commit_reflog->recno+1];
     - 		get_reflog_selector(&selector, reflog_info, dmode, force_date, 0);
     - 		if (oneline) {
     --			printf("%s: %s", selector.buf, info->message);
     -+			printf("%s: %s\n", selector.buf, info->message);
     - 		}
     - 		else {
     --			printf("Reflog: %s (%s)\nReflog message: %s",
     -+			printf("Reflog: %s (%s)\nReflog message: %s\n",
     - 			       selector.buf, info->email, info->message);
     - 		}
     - 
     -
     - ## refs/files-backend.c ##
     -@@ refs/files-backend.c: static int show_one_reflog_ent(struct strbuf *sb, each_reflog_ent_fn fn, void *c
     - 	int tz;
     - 	const char *p = sb->buf;
     - 
     --	/* old SP new SP name <email> SP time TAB msg LF */
     --	if (!sb->len || sb->buf[sb->len - 1] != '\n' ||
     --	    parse_oid_hex(p, &ooid, &p) || *p++ != ' ' ||
     -+	/* old SP new SP name <email> SP time TAB msg */
     -+	if (!sb->len || parse_oid_hex(p, &ooid, &p) || *p++ != ' ' ||
     - 	    parse_oid_hex(p, &noid, &p) || *p++ != ' ' ||
     --	    !(email_end = strchr(p, '>')) ||
     --	    email_end[1] != ' ' ||
     -+	    !(email_end = strchr(p, '>')) || email_end[1] != ' ' ||
     - 	    !(timestamp = parse_timestamp(email_end + 2, &message, 10)) ||
     - 	    !message || message[0] != ' ' ||
     --	    (message[1] != '+' && message[1] != '-') ||
     --	    !isdigit(message[2]) || !isdigit(message[3]) ||
     --	    !isdigit(message[4]) || !isdigit(message[5]))
     -+	    (message[1] != '+' && message[1] != '-') || !isdigit(message[2]) ||
     -+	    !isdigit(message[3]) || !isdigit(message[4]) ||
     -+	    !isdigit(message[5]))
     - 		return 0; /* corrupt? */
     - 	email_end[1] = '\0';
     - 	tz = strtol(message + 1, NULL, 10);
     -@@ refs/files-backend.c: static int files_for_each_reflog_ent_reverse(struct ref_store *ref_store,
     - 				strbuf_splice(&sb, 0, 0, bp + 1, endp - (bp + 1));
     - 				scanp = bp;
     - 				endp = bp + 1;
     -+				strbuf_trim_trailing_newline(&sb);
     - 				ret = show_one_reflog_ent(&sb, fn, cb_data);
     - 				strbuf_reset(&sb);
     - 				if (ret)
     -@@ refs/files-backend.c: static int files_for_each_reflog_ent_reverse(struct ref_store *ref_store,
     - 				 * Process it, and we can end the loop.
     - 				 */
     - 				strbuf_splice(&sb, 0, 0, buf, endp - buf);
     -+				strbuf_trim_trailing_newline(&sb);
     - 				ret = show_one_reflog_ent(&sb, fn, cb_data);
     - 				strbuf_reset(&sb);
     - 				break;
     -@@ refs/files-backend.c: static int files_for_each_reflog_ent(struct ref_store *ref_store,
     - 	if (!logfp)
     - 		return -1;
     - 
     --	while (!ret && !strbuf_getwholeline(&sb, logfp, '\n'))
     -+	while (!ret && !strbuf_getline(&sb, logfp))
     - 		ret = show_one_reflog_ent(&sb, fn, cb_data);
     - 	fclose(logfp);
     - 	strbuf_release(&sb);
     -@@ refs/files-backend.c: static int expire_reflog_ent(struct object_id *ooid, struct object_id *noid,
     - 	if ((*cb->should_prune_fn)(ooid, noid, email, timestamp, tz,
     - 				   message, policy_cb)) {
     - 		if (!cb->newlog)
     --			printf("would prune %s", message);
     -+			printf("would prune %s\n", message);
     - 		else if (cb->flags & EXPIRE_REFLOGS_VERBOSE)
     --			printf("prune %s", message);
     -+			printf("prune %s\n", message);
     - 	} else {
     - 		if (cb->newlog) {
     --			fprintf(cb->newlog, "%s %s %s %"PRItime" %+05d\t%s",
     --				oid_to_hex(ooid), oid_to_hex(noid),
     --				email, timestamp, tz, message);
     -+			fprintf(cb->newlog, "%s %s %s %" PRItime " %+05d\t%s\n",
     -+				oid_to_hex(ooid), oid_to_hex(noid), email,
     -+				timestamp, tz, message);
     - 			oidcpy(&cb->last_kept_oid, noid);
     - 		}
     - 		if (cb->flags & EXPIRE_REFLOGS_VERBOSE)
     --			printf("keep %s", message);
     -+			printf("keep %s\n", message);
     - 	}
     + ## t/helper/test-ref-store.c ##
     +@@ t/helper/test-ref-store.c: static int each_reflog(struct object_id *old_oid, struct object_id *new_oid,
     + 		       const char *committer, timestamp_t timestamp,
     + 		       int tz, const char *msg, void *cb_data)
     + {
     +-	printf("%s %s %s %"PRItime" %d %s\n",
     +-	       oid_to_hex(old_oid), oid_to_hex(new_oid),
     +-	       committer, timestamp, tz, msg);
     ++	printf("%s %s %s %" PRItime " %d %s", oid_to_hex(old_oid),
     ++	       oid_to_hex(new_oid), committer, timestamp, tz, msg);
       	return 0;
       }
     + 
      
       ## t/t1405-main-ref-store.sh ##
      @@ t/t1405-main-ref-store.sh: test_expect_success 'for_each_reflog()' '
 3:  8a1b094d547 ! 3:  0319503045b test-ref-store: tweaks to for-each-reflog-ent format
     @@ Metadata
      Author: Han-Wen Nienhuys <hanwen@google.com>
      
       ## Commit message ##
     -    test-ref-store: tweaks to for-each-reflog-ent format
     +    t1405: check for_each_reflog_ent_reverse() more thoroughly
      
     -    Follow the reflog format more closely, so it can be used for comparing
     -    reflogs in tests without using inspecting files under .git/logs/
     +    If we are checking for a certain ordering, we should check that there are two
     +    entries. Do this by mirroring the preceding test.
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
     - ## t/helper/test-ref-store.c ##
     -@@ t/helper/test-ref-store.c: static int each_reflog(struct object_id *old_oid, struct object_id *new_oid,
     - 		       const char *committer, timestamp_t timestamp,
     - 		       int tz, const char *msg, void *cb_data)
     - {
     --	printf("%s %s %s %"PRItime" %d %s\n",
     --	       oid_to_hex(old_oid), oid_to_hex(new_oid),
     --	       committer, timestamp, tz, msg);
     -+	printf("%s %s %s %" PRItime " %+05d\t%s\n", oid_to_hex(old_oid),
     -+	       oid_to_hex(new_oid), committer, timestamp, tz, msg);
     - 	return 0;
     - }
     - 
     -
       ## t/t1405-main-ref-store.sh ##
      @@ t/t1405-main-ref-store.sh: test_expect_success 'for_each_reflog_ent()' '
       
 4:  4ba97a4e70a ! 4:  62f5cb8a824 t1400: use test-helper ref-store to inspect reflog contents
     @@ Metadata
      Author: Han-Wen Nienhuys <hanwen@google.com>
      
       ## Commit message ##
     -    t1400: use test-helper ref-store to inspect reflog contents
     +    test-ref-store: tweaks to for-each-reflog-ent format
      
     -    This avoids inspecting the file system, which only works with the files ref
     -    backend.
     +    We have some tests that read from files in .git/logs/ hierarchy
     +    when checking if correct reflog entries are created, but that is
     +    too specific to the files backend.  Other backends like reftable
     +    may not store its reflog entries in such a "one line per entry"
     +    format.
     +
     +    Update for-each-reflog-ent test helper to produce output that
     +    is identical to lines in a reflog file files backend uses.
     +    That way, (1) the current tests can be updated to use the test
     +    helper to read the reflog entries instead of (parts of) reflog
     +    files, and perform the same inspection for correctness, and (2)
     +    when the ref backend is swapped to another backend, the updated
     +    test can be used as-is to check the correctness.
     +
     +    Adapt t1400 to use the for-each-reflog-ent test helper.
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
     - ## t/t1400-update-ref.sh ##
     -@@ t/t1400-update-ref.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
     - . ./test-lib.sh
     + ## t/helper/test-ref-store.c ##
     +@@ t/helper/test-ref-store.c: static int each_reflog(struct object_id *old_oid, struct object_id *new_oid,
     + 		       const char *committer, timestamp_t timestamp,
     + 		       int tz, const char *msg, void *cb_data)
     + {
     +-	printf("%s %s %s %" PRItime " %d %s", oid_to_hex(old_oid),
     +-	       oid_to_hex(new_oid), committer, timestamp, tz, msg);
     ++	printf("%s %s %s %" PRItime " %+05d%s%s", oid_to_hex(old_oid),
     ++	       oid_to_hex(new_oid), committer, timestamp, tz,
     ++	       *msg == '\n' ? "" : "\t", msg);
     + 	return 0;
     + }
       
     - Z=$ZERO_OID
     -+TAB='	'
     - 
     - m=refs/heads/main
     - n_dir=refs/heads/gu
     -@@ t/t1400-update-ref.sh: test_expect_success 'symref empty directory removal' '
     - cat >expect <<EOF
     - $Z $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150200 +0000	Initial Creation
     - $A $B $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150260 +0000	Switch
     --$B $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150860 +0000
     -+$B $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150860 +0000$TAB
     +
     + ## t/t1400-update-ref.sh ##
     +@@ t/t1400-update-ref.sh: $A $B $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150260 +0000	Switch
     + $B $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150860 +0000
       EOF
       test_expect_success "verifying $m's log (logged by touch)" '
      -	test_when_finished "rm -rf .git/$m .git/logs expect" &&
     @@ t/t1400-update-ref.sh: test_expect_success 'symref empty directory removal' '
       '
       
       test_expect_success "create $m (logged by config)" '
     -@@ t/t1400-update-ref.sh: test_expect_success "set $m (logged by config)" '
     - cat >expect <<EOF
     - $Z $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150320 +0000	Initial Creation
     - $A $B $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150380 +0000	Switch
     --$B $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150980 +0000
     -+$B $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150980 +0000$TAB
     +@@ t/t1400-update-ref.sh: $A $B $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150380 +0000	Switch
     + $B $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150980 +0000
       EOF
       test_expect_success "verifying $m's log (logged by config)" '
      -	test_when_finished "rm -f .git/$m .git/logs/$m expect" &&
 -:  ----------- > 5:  0288e743eb2 refs/debug: trim trailing LF from reflog message

Comments

Ævar Arnfjörð Bjarmason Nov. 29, 2021, 9:50 a.m. UTC | #1
On Thu, Nov 25 2021, Han-Wen Nienhuys via GitGitGadget wrote:

> This helps for reftable support, and will help if we want to reconsider
> under which conditions reflogs get created/updated.

Having looked at this in a bit more detail than last time
(https://lore.kernel.org/git/211123.864k83w3y4.gmgdl@evledraar.gmail.com/)
I applaud the goals, but to be blunt the specific approach just seems a
bit backwards to me.

As noted in that message I have patches to tweak the "verbose" mode to
be backend-independent, which as we see from your series is one thing in
the files backend that consumes the "message" and assumes things about
newlines.

Similarly from peeking at the reflog code I see it stores name/email
separately in a struct, but the refs.[ch] API wants to pass "name
<email>" as one string.

The below working patch shows that we have three consumers of that
specific format, so we can just xstrfmt() that instead of xstrdup()-ing
it in one case, and tweak the printf in the other two, and then we pass
it as two parameters.

So it's truly backend-independent now, i.e. a "real DB" like reftable,
SQL or whatever would store those in two separate fields.

So on the "seems backwards" above: If you apply this (and it passes all
tests) and look at what "message" is used for you're guaranteed to find
all the callers, and only a handful more of them are using that for
anything.

So I'd really expect a series like this to just change the
strbuf_getwholeline() in files_for_each_reflog_ent() to be a
strbuf_getline(). Then flip everything downstream that expects a '\n' to
either not, or to add if needed.

E.g. the test helper that's now adding a newline would be correct, and
we'd just need to tweak the file backend specific stuff that's now
skipping the addition of newlines to add them instead.

Anyway, changing the callbacks is quite verbose, and I just did this
ident->name/email change expecting that it would be more difficult.

But likewise the big benefit is that we *are* forced to tweak all
callers, so we're more likely to catch any subtle regressions, and it's
thus easier to review once we get past the verbosity.

I also wonder if for name/email/message a bit more tweaking of this to
make them all const char */size_t pairs wouldn't result in a much better
end-state. I.e. now we read into a strbuf in a loop in files-backend.c
and xstrdup() it.

Perhaps reftable is capable of just handing the underlying code pointers
into the mmap()'d file, so we could even skip all (re)allocations? Or if
not, that certainly seems like a sensible thing to anticipate in a
backend-independent interface. We could do that in the file backend if
we were a bit smarter and used mmap() instead of the
fopen()/read-in-a-loop pattern.

Just my 0.02, if you're interested in running with the below assume my
SOB.

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 27b9e78094d..2ccb20ce057 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -487,8 +487,9 @@ static void fsck_handle_reflog_oid(const char *refname, struct object_id *oid,
 }
 
 static int fsck_handle_reflog_ent(struct object_id *ooid, struct object_id *noid,
-		const char *email, timestamp_t timestamp, int tz,
-		const char *message, void *cb_data)
+				  const char *committer_name, const char *committer_email,
+				  timestamp_t timestamp, int tz,
+				  const char *message, void *cb_data)
 {
 	const char *refname = cb_data;
 
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 175c83e7cc2..0cf7d484613 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -295,7 +295,8 @@ static int unreachable(struct expire_reflog_policy_cb *cb, struct commit *commit
  * Return true iff the specified reflog entry should be expired.
  */
 static int should_expire_reflog_ent(struct object_id *ooid, struct object_id *noid,
-				    const char *email, timestamp_t timestamp, int tz,
+				    const char *committer_name, const char *committer_email,
+				    timestamp_t timestamp, int tz,
 				    const char *message, void *cb_data)
 {
 	struct expire_reflog_policy_cb *cb = cb_data;
@@ -659,8 +660,9 @@ static int cmd_reflog_expire(int argc, const char **argv, const char *prefix)
 }
 
 static int count_reflog_ent(struct object_id *ooid, struct object_id *noid,
-		const char *email, timestamp_t timestamp, int tz,
-		const char *message, void *cb_data)
+			    const char *committer_name, const char *committer_email,
+			    timestamp_t timestamp, int tz,
+			    const char *message, void *cb_data)
 {
 	struct expire_reflog_policy_cb *cb = cb_data;
 	if (!cb->cmd.expire_total || timestamp < cb->cmd.expire_total)
diff --git a/builtin/stash.c b/builtin/stash.c
index a0ccc8654df..b6963050b6d 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -620,7 +620,8 @@ static int apply_stash(int argc, const char **argv, const char *prefix)
 }
 
 static int reject_reflog_ent(struct object_id *ooid, struct object_id *noid,
-			     const char *email, timestamp_t timestamp, int tz,
+			     const char *committer_name, const char *committer_email,
+			     timestamp_t timestamp, int tz,
 			     const char *message, void *cb_data)
 {
 	return 1;
diff --git a/commit.c b/commit.c
index 551de4903c1..ea97f639cc4 100644
--- a/commit.c
+++ b/commit.c
@@ -937,8 +937,9 @@ static void add_one_commit(struct object_id *oid, struct rev_collect *revs)
 }
 
 static int collect_one_reflog_ent(struct object_id *ooid, struct object_id *noid,
-				  const char *ident, timestamp_t timestamp,
-				  int tz, const char *message, void *cbdata)
+				  const char *author, const char *email,
+				  timestamp_t timestamp, int tz,
+				  const char *message, void *cbdata)
 {
 	struct rev_collect *revs = cbdata;
 
diff --git a/object-name.c b/object-name.c
index fdff4601b2c..56b131fb4a7 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1287,7 +1287,8 @@ struct grab_nth_branch_switch_cbdata {
 };
 
 static int grab_nth_branch_switch(struct object_id *ooid, struct object_id *noid,
-				  const char *email, timestamp_t timestamp, int tz,
+				  const char *author, const char *email,
+				  timestamp_t timestamp, int tz,
 				  const char *message, void *cb_data)
 {
 	struct grab_nth_branch_switch_cbdata *cb = cb_data;
diff --git a/reflog-walk.c b/reflog-walk.c
index 8ac4b284b6b..12865f4cb84 100644
--- a/reflog-walk.c
+++ b/reflog-walk.c
@@ -20,8 +20,9 @@ struct complete_reflogs {
 };
 
 static int read_one_reflog(struct object_id *ooid, struct object_id *noid,
-		const char *email, timestamp_t timestamp, int tz,
-		const char *message, void *cb_data)
+			   const char *author, const char *email,
+			   timestamp_t timestamp, int tz,
+			   const char *message, void *cb_data)
 {
 	struct complete_reflogs *array = cb_data;
 	struct reflog_info *item;
@@ -30,7 +31,7 @@ static int read_one_reflog(struct object_id *ooid, struct object_id *noid,
 	item = array->items + array->nr;
 	oidcpy(&item->ooid, ooid);
 	oidcpy(&item->noid, noid);
-	item->email = xstrdup(email);
+	item->email = xstrfmt("%s <%s>", author, email);
 	item->timestamp = timestamp;
 	item->tz = tz;
 	item->message = xstrdup(message);
diff --git a/refs.c b/refs.c
index d7cc0a23a3b..8d95fd2a6bb 100644
--- a/refs.c
+++ b/refs.c
@@ -894,8 +894,9 @@ static void set_read_ref_cutoffs(struct read_ref_at_cb *cb,
 }
 
 static int read_ref_at_ent(struct object_id *ooid, struct object_id *noid,
-		const char *email, timestamp_t timestamp, int tz,
-		const char *message, void *cb_data)
+			   const char *committer_name, const char *committer_email,
+			   timestamp_t timestamp, int tz,
+			   const char *message, void *cb_data)
 {
 	struct read_ref_at_cb *cb = cb_data;
 	int reached_count;
@@ -936,8 +937,9 @@ static int read_ref_at_ent(struct object_id *ooid, struct object_id *noid,
 }
 
 static int read_ref_at_ent_newest(struct object_id *ooid, struct object_id *noid,
-				  const char *email, timestamp_t timestamp,
-				  int tz, const char *message, void *cb_data)
+				  const char *author, const char *committer_email,
+				  timestamp_t timestamp, int tz,
+				  const char *message, void *cb_data)
 {
 	struct read_ref_at_cb *cb = cb_data;
 
@@ -948,8 +950,9 @@ static int read_ref_at_ent_newest(struct object_id *ooid, struct object_id *noid
 }
 
 static int read_ref_at_ent_oldest(struct object_id *ooid, struct object_id *noid,
-				  const char *email, timestamp_t timestamp,
-				  int tz, const char *message, void *cb_data)
+				  const char *committer_name, const char *committer_email,
+				  timestamp_t timestamp, int tz,
+				  const char *message, void *cb_data)
 {
 	struct read_ref_at_cb *cb = cb_data;
 
diff --git a/refs.h b/refs.h
index d5099d4984e..a34e98a4123 100644
--- a/refs.h
+++ b/refs.h
@@ -463,8 +463,9 @@ int delete_reflog(const char *refname);
  */
 typedef int each_reflog_ent_fn(
 		struct object_id *old_oid, struct object_id *new_oid,
-		const char *committer, timestamp_t timestamp,
-		int tz, const char *msg, void *cb_data);
+		const char *committer_name, const char *committer_email,
+		timestamp_t timestamp, int tz,
+		const char *msg, void *cb_data);
 
 /* Iterate over reflog entries in the log for `refname`. */
 
@@ -807,7 +808,7 @@ typedef void reflog_expiry_prepare_fn(const char *refname,
 				      void *cb_data);
 typedef int reflog_expiry_should_prune_fn(struct object_id *ooid,
 					  struct object_id *noid,
-					  const char *email,
+					  const char *author, const char *email,
 					  timestamp_t timestamp, int tz,
 					  const char *message, void *cb_data);
 typedef void reflog_expiry_cleanup_fn(void *cb_data);
diff --git a/refs/debug.c b/refs/debug.c
index 8667c640237..dc4b417abeb 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -277,8 +277,9 @@ struct debug_reflog {
 
 static int debug_print_reflog_ent(struct object_id *old_oid,
 				  struct object_id *new_oid,
-				  const char *committer, timestamp_t timestamp,
-				  int tz, const char *msg, void *cb_data)
+				  const char *committer, const char *committer_email,
+				  timestamp_t timestamp, int tz,
+				  const char *msg, void *cb_data)
 {
 	struct debug_reflog *dbg = (struct debug_reflog *)cb_data;
 	int ret;
@@ -289,10 +290,11 @@ static int debug_print_reflog_ent(struct object_id *old_oid,
 	if (new_oid)
 		oid_to_hex_r(n, new_oid);
 
-	ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg,
-		      dbg->cb_data);
-	trace_printf_key(&trace_refs, "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
-		dbg->refname, ret, o, n, committer, (long int)timestamp, msg);
+	ret = dbg->fn(old_oid, new_oid, committer, committer_email,
+		      timestamp, tz, msg, dbg->cb_data);
+	trace_printf_key(&trace_refs, "reflog_ent %s (ret %d): %s -> %s, %s <%s> %ld \"%s\"\n",
+		dbg->refname, ret, o, n, committer, committer_email,
+			 (long int)timestamp, msg);
 	return ret;
 }
 
@@ -374,12 +376,13 @@ static void debug_reflog_expiry_prepare(const char *refname,
 
 static int debug_reflog_expiry_should_prune_fn(struct object_id *ooid,
 					       struct object_id *noid,
-					       const char *email,
+					       const char *committer_name, const char *committer_email,
 					       timestamp_t timestamp, int tz,
 					       const char *message, void *cb_data) {
 	struct debug_reflog_expiry_should_prune *prune = cb_data;
 
-	int result = prune->should_prune(ooid, noid, email, timestamp, tz, message, prune->cb_data);
+	int result = prune->should_prune(ooid, noid, committer_name, committer_email,
+					 timestamp, tz, message, prune->cb_data);
 	trace_printf_key(&trace_refs, "reflog_expire_should_prune: %s %ld: %d\n", message, (long int) timestamp, result);
 	return result;
 }
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 151b0056fe5..d15f3342c31 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1931,7 +1931,10 @@ static int files_delete_reflog(struct ref_store *ref_store,
 static int show_one_reflog_ent(struct strbuf *sb, each_reflog_ent_fn fn, void *cb_data)
 {
 	struct object_id ooid, noid;
-	char *email_end, *message;
+	const char *name;
+	char *email;
+	char *email_end;
+	char *message;
 	timestamp_t timestamp;
 	int tz;
 	const char *p = sb->buf;
@@ -1940,6 +1943,8 @@ static int show_one_reflog_ent(struct strbuf *sb, each_reflog_ent_fn fn, void *c
 	if (!sb->len || sb->buf[sb->len - 1] != '\n' ||
 	    parse_oid_hex(p, &ooid, &p) || *p++ != ' ' ||
 	    parse_oid_hex(p, &noid, &p) || *p++ != ' ' ||
+	    !(name = p) ||
+	    !(email = strchr(p, '<')) ||
 	    !(email_end = strchr(p, '>')) ||
 	    email_end[1] != ' ' ||
 	    !(timestamp = parse_timestamp(email_end + 2, &message, 10)) ||
@@ -1948,13 +1953,20 @@ static int show_one_reflog_ent(struct strbuf *sb, each_reflog_ent_fn fn, void *c
 	    !isdigit(message[2]) || !isdigit(message[3]) ||
 	    !isdigit(message[4]) || !isdigit(message[5]))
 		return 0; /* corrupt? */
-	email_end[1] = '\0';
+	/* \0 the SP before "<", marks end of "name"... */
+	*(email - 1) = '\0';
+	/* ... and advance past the opening... "<" */
+	email++;
+	/* ...and stop at the ">" */
+	email_end[0] = '\0';
+
 	tz = strtol(message + 1, NULL, 10);
 	if (message[6] != '\t')
 		message += 6;
 	else
 		message += 7;
-	return fn(&ooid, &noid, p, timestamp, tz, message, cb_data);
+	return fn(&ooid, &noid, name, email, timestamp, tz, message,
+		  cb_data);
 }
 
 static char *find_beginning_of_line(char *bob, char *scan)
@@ -3047,7 +3059,8 @@ struct expire_reflog_cb {
 };
 
 static int expire_reflog_ent(struct object_id *ooid, struct object_id *noid,
-			     const char *email, timestamp_t timestamp, int tz,
+			     const char *author, const char *email,
+			     timestamp_t timestamp, int tz,
 			     const char *message, void *cb_data)
 {
 	struct expire_reflog_cb *cb = cb_data;
@@ -3056,7 +3069,8 @@ static int expire_reflog_ent(struct object_id *ooid, struct object_id *noid,
 	if (cb->flags & EXPIRE_REFLOGS_REWRITE)
 		ooid = &cb->last_kept_oid;
 
-	if ((*cb->should_prune_fn)(ooid, noid, email, timestamp, tz,
+	if ((*cb->should_prune_fn)(ooid, noid, author, email,
+				   timestamp, tz,
 				   message, policy_cb)) {
 		if (!cb->newlog)
 			printf("would prune %s", message);
@@ -3064,9 +3078,9 @@ static int expire_reflog_ent(struct object_id *ooid, struct object_id *noid,
 			printf("prune %s", message);
 	} else {
 		if (cb->newlog) {
-			fprintf(cb->newlog, "%s %s %s %"PRItime" %+05d\t%s",
+			fprintf(cb->newlog, "%s %s %s <%s> %"PRItime" %+05d\t%s",
 				oid_to_hex(ooid), oid_to_hex(noid),
-				email, timestamp, tz, message);
+				author, email, timestamp, tz, message);
 			oidcpy(&cb->last_kept_oid, noid);
 		}
 		if (cb->flags & EXPIRE_REFLOGS_VERBOSE)
diff --git a/remote.c b/remote.c
index f958543d707..3cd0cdce879 100644
--- a/remote.c
+++ b/remote.c
@@ -2428,8 +2428,9 @@ struct check_and_collect_until_cb_data {
 
 /* Get the timestamp of the latest entry. */
 static int peek_reflog(struct object_id *o_oid, struct object_id *n_oid,
-		       const char *ident, timestamp_t timestamp,
-		       int tz, const char *message, void *cb_data)
+		       const char *committer_name, const char *committer_email,
+		       timestamp_t timestamp, int tz,
+		       const char *message, void *cb_data)
 {
 	timestamp_t *ts = cb_data;
 	*ts = timestamp;
@@ -2438,8 +2439,9 @@ static int peek_reflog(struct object_id *o_oid, struct object_id *n_oid,
 
 static int check_and_collect_until(struct object_id *o_oid,
 				   struct object_id *n_oid,
-				   const char *ident, timestamp_t timestamp,
-				   int tz, const char *message, void *cb_data)
+				   const char *committer_name, const char *committer_email,
+				   timestamp_t timestamp, int tz,
+				   const char *message, void *cb_data)
 {
 	struct commit *commit;
 	struct check_and_collect_until_cb_data *cb = cb_data;
diff --git a/revision.c b/revision.c
index 1981a0859f0..2979f456cb6 100644
--- a/revision.c
+++ b/revision.c
@@ -1597,8 +1597,9 @@ static void handle_one_reflog_commit(struct object_id *oid, void *cb_data)
 }
 
 static int handle_one_reflog_ent(struct object_id *ooid, struct object_id *noid,
-		const char *email, timestamp_t timestamp, int tz,
-		const char *message, void *cb_data)
+				 const char *author, const char *email,
+				 timestamp_t timestamp, int tz,
+				 const char *message, void *cb_data)
 {
 	handle_one_reflog_commit(ooid, cb_data);
 	handle_one_reflog_commit(noid, cb_data);
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index b314b81a45b..727ac008ad8 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -148,12 +148,13 @@ static int cmd_for_each_reflog(struct ref_store *refs, const char **argv)
 }
 
 static int each_reflog(struct object_id *old_oid, struct object_id *new_oid,
-		       const char *committer, timestamp_t timestamp,
-		       int tz, const char *msg, void *cb_data)
+		       const char *committer, const char *committer_email,
+		       timestamp_t timestamp, int tz,
+		       const char *msg, void *cb_data)
 {
-	printf("%s %s %s %"PRItime" %d %s\n",
+	printf("%s %s %s <%s> %"PRItime" %d %s\n",
 	       oid_to_hex(old_oid), oid_to_hex(new_oid),
-	       committer, timestamp, tz, msg);
+	       committer, committer_email, timestamp, tz, msg);
 	return 0;
 }
 
diff --git a/wt-status.c b/wt-status.c
index e4f29b2b4c9..a8526a4a100 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -940,7 +940,8 @@ static void wt_longstatus_print_changed(struct wt_status *s)
 }
 
 static int stash_count_refs(struct object_id *ooid, struct object_id *noid,
-			    const char *email, timestamp_t timestamp, int tz,
+			    const char *committer_name, const char *committer_email,
+			    timestamp_t timestamp, int tz,
 			    const char *message, void *cb_data)
 {
 	int *c = cb_data;
@@ -1592,7 +1593,8 @@ struct grab_1st_switch_cbdata {
 };
 
 static int grab_1st_switch(struct object_id *ooid, struct object_id *noid,
-			   const char *email, timestamp_t timestamp, int tz,
+			   const char *committer_name , const char *committer_email,
+			   timestamp_t timestamp, int tz,
 			   const char *message, void *cb_data)
 {
 	struct grab_1st_switch_cbdata *cb = cb_data;
Han-Wen Nienhuys Nov. 29, 2021, 6:24 p.m. UTC | #2
On Mon, Nov 29, 2021 at 11:14 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> > This helps for reftable support, and will help if we want to reconsider
> > under which conditions reflogs get created/updated.
>
> Having looked at this in a bit more detail than last time
> (https://lore.kernel.org/git/211123.864k83w3y4.gmgdl@evledraar.gmail.com/)
> I applaud the goals, but to be blunt the specific approach just seems a
> bit backwards to me.
>
> As noted in that message I have patches to tweak the "verbose" mode to
> be backend-independent, which as we see from your series is one thing in
> the files backend that consumes the "message" and assumes things about
> newlines.

In v2, I went with Jun's suggestion, and left the newlines alone, just
trimming them in refs/debug.c .  I think that makes most of your mail
irrelevant?

> Perhaps reftable is capable of just handing the underlying code pointers
> into the mmap()'d file, so we could even skip all (re)allocations? Or if
> not, that certainly seems like a sensible thing to anticipate in a
> backend-independent interface. We could do that in the file backend if
> we were a bit smarter and used mmap() instead of the
> fopen()/read-in-a-loop pattern.

It sounds like premature optimization. Reading reflogs is not usually
a performance sensitive operation. Also, if you hand out mmap'd
pointers, how would the reftable storage code know when it is safe to
close and munmap the file?

>
> Just my 0.02, if you're interested in running with the below assume my
> SOB.

What is SOB in this context?
Junio C Hamano Nov. 29, 2021, 10:30 p.m. UTC | #3
Han-Wen Nienhuys <hanwen@google.com> writes:

Just this part...

>> Just my 0.02, if you're interested in running with the below assume my
>> SOB.
>
> What is SOB in this context?

"Signed-off-by:"
Ævar Arnfjörð Bjarmason Nov. 29, 2021, 11:28 p.m. UTC | #4
On Mon, Nov 29 2021, Han-Wen Nienhuys wrote:

> On Mon, Nov 29, 2021 at 11:14 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> > This helps for reftable support, and will help if we want to reconsider
>> > under which conditions reflogs get created/updated.
>>
>> Having looked at this in a bit more detail than last time
>> (https://lore.kernel.org/git/211123.864k83w3y4.gmgdl@evledraar.gmail.com/)
>> I applaud the goals, but to be blunt the specific approach just seems a
>> bit backwards to me.
>>
>> As noted in that message I have patches to tweak the "verbose" mode to
>> be backend-independent, which as we see from your series is one thing in
>> the files backend that consumes the "message" and assumes things about
>> newlines.
>
> In v2, I went with Jun's suggestion, and left the newlines alone, just
> trimming them in refs/debug.c .  I think that makes most of your mail
> irrelevant?

To whatever immediate problem you're trying to solve? Probably. I'm
mainly pointing out that we can make some of these APIs nicer & a bit
more abstract from the details of the file backend.

Or, as is the case with "ident" v.s. "committer_name/committer_email" in
these reflog callbacks no existing caller actually cares about that
format, so if/when we get to that reftable integration (which seems to
have the two as seperate fields) splitting those up in refs.[ch]
probably makes sense.

>> Perhaps reftable is capable of just handing the underlying code pointers
>> into the mmap()'d file, so we could even skip all (re)allocations? Or if
>> not, that certainly seems like a sensible thing to anticipate in a
>> backend-independent interface. We could do that in the file backend if
>> we were a bit smarter and used mmap() instead of the
>> fopen()/read-in-a-loop pattern.
>
> It sounds like premature optimization. Reading reflogs is not usually
> a performance sensitive operation. Also, if you hand out mmap'd
> pointers, how would the reftable storage code know when it is safe to
> close and munmap the file?

The same way we do the fopen/fclose lifetime now, i.e. you could rely on
them for the iteration, which seems to be a common pattern in code that
needs this.

I was aiming more for the lack of premature optimization there,
i.e. instead of byte-twiddling things by injecting \0s we could just
have char */size_t offsets (as noted elsewhere), which would also nicely
allow binary storage formats, if those formats happen to have an
embedded string. Is that true of reftable?
Han-Wen Nienhuys Dec. 2, 2021, 4:11 p.m. UTC | #5
On Tue, Nov 30, 2021 at 12:41 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> I was aiming more for the lack of premature optimization there,
> i.e. instead of byte-twiddling things by injecting \0s we could just
> have char */size_t offsets (as noted elsewhere), which would also nicely
> allow binary storage formats, if those formats happen to have an
> embedded string. Is that true of reftable?

reftable zlib compresses the reflog, so you can't mmap strings from the storage.