mbox series

[v2,0/7] reflog: introduce subcommand to list reflogs

Message ID cover.1708418805.git.ps@pks.im (mailing list archive)
Headers show
Series reflog: introduce subcommand to list reflogs | expand

Message

Patrick Steinhardt Feb. 20, 2024, 9:06 a.m. UTC
Hi,

this is the second version of my patch series that introduces a new `git
reflog list` subcommand to list available reflogs in a repository.

Changes compared to v1:

  - Patch 2: Clarified the commit message to hopefully explain better
    why a higher level implementation of reflog sorting would have
    increased latency.

  - Patch 2: Introduced a helper function that unifies the logic to
    yield the next directory entry.

  - Patch 3: Mark the merged reflog iterator as sorted, which I missed
    in my previous round.

  - Patch 4: This patch is new and simplifies the code to require all
    ref iterators to be sorted.

Junio, I noticed that you already merged v1 of this patch series to
`next`. I was a bit surprised to see it merged down this fast, so I
assume that this is only done due to the pending Git v2.44 release and
that you plan to reroll `next` anyway. I thus didn't send follow-up
patches but resent the whole patch series as v2. If I misinterpreted
your intent I'm happy to send the changes as follow-up patches instead.

The patch series continues to depend on ps/reftable-backend at
8a0bebdeae (refs/reftable: fix leak when copying reflog fails,
2024-02-08).

Thanks!

Patrick

Patrick Steinhardt (7):
  dir-iterator: pass name to `prepare_next_entry_data()` directly
  dir-iterator: support iteration in sorted order
  refs/files: sort reflogs returned by the reflog iterator
  refs: always treat iterators as ordered
  refs: drop unused params from the reflog iterator callback
  refs: stop resolving ref corresponding to reflogs
  builtin/reflog: introduce subcommand to list reflogs

 Documentation/git-reflog.txt   |   3 +
 builtin/fsck.c                 |   4 +-
 builtin/reflog.c               |  37 +++++++++++-
 dir-iterator.c                 | 105 ++++++++++++++++++++++++++++-----
 dir-iterator.h                 |   3 +
 refs.c                         |  27 ++++++---
 refs.h                         |  11 +++-
 refs/debug.c                   |   3 +-
 refs/files-backend.c           |  27 ++-------
 refs/iterator.c                |  26 +++-----
 refs/packed-backend.c          |   2 +-
 refs/ref-cache.c               |   2 +-
 refs/refs-internal.h           |  18 +-----
 refs/reftable-backend.c        |  20 ++-----
 revision.c                     |   4 +-
 t/helper/test-ref-store.c      |  18 ++++--
 t/t0600-reffiles-backend.sh    |  24 ++++----
 t/t1405-main-ref-store.sh      |   8 +--
 t/t1406-submodule-ref-store.sh |   8 +--
 t/t1410-reflog.sh              |  69 ++++++++++++++++++++++
 20 files changed, 286 insertions(+), 133 deletions(-)

Range-diff against v1:
1:  12de25dfe2 = 1:  12de25dfe2 dir-iterator: pass name to `prepare_next_entry_data()` directly
2:  8a588175db ! 2:  788afce189 dir-iterator: support iteration in sorted order
    @@ Commit message
         iterator to enumerate reflogs, returning reflog names and exposing them
         to the user would inherit the indeterministic ordering. Naturally, it
         would make for a terrible user interface to show a list with no
    -    discernible order. While this could be handled at a higher level by the
    -    new subcommand itself by collecting and ordering the reflogs, this would
    -    be inefficient and introduce latency when there are many reflogs.
    +    discernible order.
    +
    +    While this could be handled at a higher level by the new subcommand
    +    itself by collecting and ordering the reflogs, this would be inefficient
    +    because we would first have to collect all reflogs before we can sort
    +    them, which would introduce additional latency when there are many
    +    reflogs.
     
         Instead, introduce a new option into the directory iterator that asks
         for its entries to be yielded in lexicographical order. If set, the
    -    iterator will read all directory entries greedily end sort them before
    +    iterator will read all directory entries greedily and sort them before
         we start to iterate over them.
     
         While this will of course also incur overhead as we cannot yield the
    @@ dir-iterator.c
      
      struct dir_iterator_level {
      	DIR *dir;
    + 
    ++	/*
    ++	 * The directory entries of the current level. This list will only be
    ++	 * populated when the iterator is ordered. In that case, `dir` will be
    ++	 * set to `NULL`.
    ++	 */
     +	struct string_list entries;
     +	size_t entries_idx;
    - 
    ++
      	/*
      	 * The length of the directory part of path at this level
    + 	 * (including a trailing '/'):
    +@@ dir-iterator.c: struct dir_iterator_int {
    + 	unsigned int flags;
    + };
    + 
    ++static int next_directory_entry(DIR *dir, const char *path,
    ++				struct dirent **out)
    ++{
    ++	struct dirent *de;
    ++
    ++repeat:
    ++	errno = 0;
    ++	de = readdir(dir);
    ++	if (!de) {
    ++		if (errno) {
    ++			warning_errno("error reading directory '%s'",
    ++				      path);
    ++			return -1;
    ++		}
    ++
    ++		return 1;
    ++	}
    ++
    ++	if (is_dot_or_dotdot(de->d_name))
    ++		goto repeat;
    ++
    ++	*out = de;
    ++	return 0;
    ++}
    ++
    + /*
    +  * Push a level in the iter stack and initialize it with information from
    +  * the directory pointed by iter->base->path. It is assumed that this
     @@ dir-iterator.c: static int push_level(struct dir_iterator_int *iter)
      		return -1;
      	}
    @@ dir-iterator.c: static int push_level(struct dir_iterator_int *iter)
     +	 * directly.
     +	 */
     +	if (iter->flags & DIR_ITERATOR_SORTED) {
    -+		while (1) {
    -+			struct dirent *de;
    ++		struct dirent *de;
     +
    -+			errno = 0;
    -+			de = readdir(level->dir);
    -+			if (!de) {
    -+				if (errno && errno != ENOENT) {
    -+					warning_errno("error reading directory '%s'",
    -+						      iter->base.path.buf);
    ++		while (1) {
    ++			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
    ++			if (ret < 0) {
    ++				if (errno != ENOENT &&
    ++				    iter->flags & DIR_ITERATOR_PEDANTIC)
     +					return -1;
    -+				}
    -+
    ++				continue;
    ++			} else if (ret > 0) {
     +				break;
     +			}
     +
    -+			if (is_dot_or_dotdot(de->d_name))
    -+				continue;
    -+
     +			string_list_append(&level->entries, de->d_name);
     +		}
     +		string_list_sort(&level->entries);
    @@ dir-iterator.c: static int pop_level(struct dir_iterator_int *iter)
      	return --iter->levels_nr;
      }
     @@ dir-iterator.c: int dir_iterator_advance(struct dir_iterator *dir_iterator)
    - 
    - 	/* Loop until we find an entry that we can give back to the caller. */
    - 	while (1) {
    --		struct dirent *de;
    + 		struct dirent *de;
      		struct dir_iterator_level *level =
      			&iter->levels[iter->levels_nr - 1];
    -+		struct dirent *de;
     +		const char *name;
      
      		strbuf_setlen(&iter->base.path, level->prefix_len);
     -		errno = 0;
     -		de = readdir(level->dir);
    --
    + 
     -		if (!de) {
     -			if (errno) {
     -				warning_errno("error reading directory '%s'",
     -					      iter->base.path.buf);
    --				if (iter->flags & DIR_ITERATOR_PEDANTIC)
    --					goto error_out;
    ++		if (level->dir) {
    ++			int ret = next_directory_entry(level->dir, iter->base.path.buf, &de);
    ++			if (ret < 0) {
    + 				if (iter->flags & DIR_ITERATOR_PEDANTIC)
    + 					goto error_out;
     -			} else if (pop_level(iter) == 0) {
     -				return dir_iterator_abort(dir_iterator);
    -+
    -+		if (level->dir) {
    -+			errno = 0;
    -+			de = readdir(level->dir);
    -+			if (!de) {
    -+				if (errno) {
    -+					warning_errno("error reading directory '%s'",
    -+						      iter->base.path.buf);
    -+					if (iter->flags & DIR_ITERATOR_PEDANTIC)
    -+						goto error_out;
    -+				} else if (pop_level(iter) == 0) {
    ++				continue;
    ++			} else if (ret > 0) {
    ++				if (pop_level(iter) == 0)
     +					return dir_iterator_abort(dir_iterator);
    -+				}
     +				continue;
      			}
     -			continue;
    @@ dir-iterator.c: int dir_iterator_advance(struct dir_iterator *dir_iterator)
      
     -		if (is_dot_or_dotdot(de->d_name))
     -			continue;
    -+			if (is_dot_or_dotdot(de->d_name))
    -+				continue;
    - 
    --		if (prepare_next_entry_data(iter, de->d_name)) {
     +			name = de->d_name;
     +		} else {
     +			if (level->entries_idx >= level->entries.nr) {
    @@ dir-iterator.c: int dir_iterator_advance(struct dir_iterator *dir_iterator)
     +					return dir_iterator_abort(dir_iterator);
     +				continue;
     +			}
    -+
    + 
    +-		if (prepare_next_entry_data(iter, de->d_name)) {
     +			name = level->entries.items[level->entries_idx++].string;
     +		}
     +
3:  e4e4fac05c ! 3:  32b24a3d4b refs/files: sort reflogs returned by the reflog iterator
    @@ refs/files-backend.c: static struct ref_iterator *reflog_iterator_begin(struct r
      	iter->dir_iterator = diter;
      	iter->ref_store = ref_store;
      	strbuf_release(&sb);
    +@@ refs/files-backend.c: static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
    + 		return reflog_iterator_begin(ref_store, refs->gitcommondir);
    + 	} else {
    + 		return merge_ref_iterator_begin(
    +-			0, reflog_iterator_begin(ref_store, refs->base.gitdir),
    ++			1, reflog_iterator_begin(ref_store, refs->base.gitdir),
    + 			reflog_iterator_begin(ref_store, refs->gitcommondir),
    + 			reflog_iterator_select, refs);
    + 	}
     
      ## t/t0600-reffiles-backend.sh ##
     @@ t/t0600-reffiles-backend.sh: test_expect_success 'for_each_reflog()' '
-:  ---------- > 4:  4254f23fd4 refs: always treat iterators as ordered
4:  be512ef268 ! 5:  240334df6c refs: drop unused params from the reflog iterator callback
    @@ refs/reftable-backend.c: static int reftable_reflog_iterator_advance(struct ref_
      	}
     @@ refs/reftable-backend.c: static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
      	iter = xcalloc(1, sizeof(*iter));
    - 	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
    + 	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable);
      	iter->refs = refs;
     -	iter->base.oid = &iter->oid;
      
5:  a7459b9483 = 6:  7928661318 refs: stop resolving ref corresponding to reflogs
6:  cddb2de939 = 7:  d7b9cff4c3 builtin/reflog: introduce subcommand to list reflogs

Comments

Junio C Hamano Feb. 20, 2024, 5:22 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

>       struct dir_iterator_level {
>       	DIR *dir;
>     + 
>     ++	/*
>     ++	 * The directory entries of the current level. This list will only be
>     ++	 * populated when the iterator is ordered. In that case, `dir` will be
>     ++	 * set to `NULL`.
>     ++	 */
>      +	struct string_list entries;
>      +	size_t entries_idx;

Reads well.  Nice.

>     ++static int next_directory_entry(DIR *dir, const char *path,
>     ++				struct dirent **out)
>     ++{
>     ++	struct dirent *de;
>     ++
>     ++repeat:
>     ++	errno = 0;
>     ++	de = readdir(dir);
>     ++	if (!de) {
>     ++		if (errno) {
>     ++			warning_errno("error reading directory '%s'",
>     ++				      path);
>     ++			return -1;
>     ++		}
>     ++
>     ++		return 1;
>     ++	}
>     ++
>     ++	if (is_dot_or_dotdot(de->d_name))
>     ++		goto repeat;
>     ++
>     ++	*out = de;
>     ++	return 0;
>     ++}

Very nice to encapsulate the common readdir() loop into this helper.

> 3:  e4e4fac05c ! 3:  32b24a3d4b refs/files: sort reflogs returned by the reflog iterator
>     @@ refs/files-backend.c: static struct ref_iterator *reflog_iterator_begin(struct r
>       	iter->dir_iterator = diter;
>       	iter->ref_store = ref_store;
>       	strbuf_release(&sb);
>     +@@ refs/files-backend.c: static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
>     + 		return reflog_iterator_begin(ref_store, refs->gitcommondir);
>     + 	} else {
>     + 		return merge_ref_iterator_begin(
>     +-			0, reflog_iterator_begin(ref_store, refs->base.gitdir),
>     ++			1, reflog_iterator_begin(ref_store, refs->base.gitdir),
>     + 			reflog_iterator_begin(ref_store, refs->gitcommondir),
>     + 			reflog_iterator_select, refs);
>     + 	}

This hunk is new.  Is there a downside to force merged iterators to
always be sorted?  The ones that are combined are all sorted so it
is natural to force sorting like this code does?  It might deserve
explaining, and would certainly help future readers who runs "blame"
on this code to figure out what made us think always sorting is a
good direction forward.

> -:  ---------- > 4:  4254f23fd4 refs: always treat iterators as ordered

This one is new, and deserves a separate review.

> 4:  be512ef268 ! 5:  240334df6c refs: drop unused params from the reflog iterator callback
>     @@ refs/reftable-backend.c: static int reftable_reflog_iterator_advance(struct ref_
>       	}
>      @@ refs/reftable-backend.c: static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
>       	iter = xcalloc(1, sizeof(*iter));
>     - 	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
>     + 	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable);
>       	iter->refs = refs;
>      -	iter->base.oid = &iter->oid;
>       
> 5:  a7459b9483 = 6:  7928661318 refs: stop resolving ref corresponding to reflogs
> 6:  cddb2de939 = 7:  d7b9cff4c3 builtin/reflog: introduce subcommand to list reflogs

Looking good from a cursory read.

Thanks for a quick reroll.
Patrick Steinhardt Feb. 21, 2024, 11:48 a.m. UTC | #2
On Tue, Feb 20, 2024 at 09:22:36AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
[snip]
> > 3:  e4e4fac05c ! 3:  32b24a3d4b refs/files: sort reflogs returned by the reflog iterator
> >     @@ refs/files-backend.c: static struct ref_iterator *reflog_iterator_begin(struct r
> >       	iter->dir_iterator = diter;
> >       	iter->ref_store = ref_store;
> >       	strbuf_release(&sb);
> >     +@@ refs/files-backend.c: static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
> >     + 		return reflog_iterator_begin(ref_store, refs->gitcommondir);
> >     + 	} else {
> >     + 		return merge_ref_iterator_begin(
> >     +-			0, reflog_iterator_begin(ref_store, refs->base.gitdir),
> >     ++			1, reflog_iterator_begin(ref_store, refs->base.gitdir),
> >     + 			reflog_iterator_begin(ref_store, refs->gitcommondir),
> >     + 			reflog_iterator_select, refs);
> >     + 	}
> 
> This hunk is new.  Is there a downside to force merged iterators to
> always be sorted?  The ones that are combined are all sorted so it
> is natural to force sorting like this code does?  It might deserve
> explaining, and would certainly help future readers who runs "blame"
> on this code to figure out what made us think always sorting is a
> good direction forward.

Not really -- it merely gets passed down to the base ref iterator to
indicate that the entries are returned in lexicographic order. But I've
been jumping the gun here: the `reflog_iterator_select()` function does
not ensure lexicographic ordering between the two merged iterators right
now. I was assuming so because I implemented it in the reftable backend
like that. Should've double checked.

It's an easy fix though, which I'll add as another patch on top. Thanks
for making me think twice.

Patrick