mbox series

[v5,00/16] refs: implement jump lists for packed backend

Message ID cover.1689023520.git.me@ttaylorr.com (mailing list archive)
Headers show
Series refs: implement jump lists for packed backend | expand

Message

Taylor Blau July 10, 2023, 9:12 p.m. UTC
Here is another reroll of my series to implement jump (née skip) lists
for the packed refs backend, based on top of the current 'master'.

This responds to a review of the entire series that Peff gave a couple
of weeks ago[1], which should satisfy all of the comments that he
raised, most of which were relatively minor.

The notable changes from last time are:

  - clarifying some commit messages, notably to explicitly state that
    certain clean-ups do not indicate existing bugs, but are defensive
    against potential future ones
  - polishing up some of the API changes, to avoid having
    `match_name_as_path()` take a ref_filter pointer.
  - more timing data and documentation in the first substantive patch
  - clean-up in the upload-pack.c to make the changes clearer

As usual, a range-diff is included below for convenience.

Thanks in advance for your review!

[1]: https://lore.kernel.org/git/20230703062925.GK3502534@coredump.intra.peff.net/

Jeff King (5):
  refs.c: rename `ref_filter`
  ref-filter.h: provide `REF_FILTER_INIT`
  ref-filter: clear reachable list pointers after freeing
  ref-filter: add `ref_filter_clear()`
  ref-filter.c: parameterize match functions over patterns

Taylor Blau (11):
  builtin/for-each-ref.c: add `--exclude` option
  refs: plumb `exclude_patterns` argument throughout
  refs/packed-backend.c: refactor `find_reference_location()`
  refs/packed-backend.c: implement jump lists to avoid excluded
    pattern(s)
  refs/packed-backend.c: add trace2 counters for jump list
  revision.h: store hidden refs in a `strvec`
  refs.h: let `for_each_namespaced_ref()` take excluded patterns
  refs.h: implement `hidden_refs_to_excludes()`
  builtin/receive-pack.c: avoid enumerating hidden references
  upload-pack.c: avoid enumerating hidden refs where possible
  ls-refs.c: avoid enumerating hidden refs where possible

 Documentation/git-for-each-ref.txt |   6 +
 builtin/branch.c                   |   4 +-
 builtin/for-each-ref.c             |   7 +-
 builtin/receive-pack.c             |   8 +-
 builtin/tag.c                      |   4 +-
 http-backend.c                     |   2 +-
 ls-refs.c                          |   7 +-
 ref-filter.c                       |  66 +++++++---
 ref-filter.h                       |  12 ++
 refs.c                             |  85 ++++++++----
 refs.h                             |  29 +++-
 refs/debug.c                       |   5 +-
 refs/files-backend.c               |   5 +-
 refs/packed-backend.c              | 204 +++++++++++++++++++++++++----
 refs/refs-internal.h               |   7 +-
 revision.c                         |   4 +-
 revision.h                         |   5 +-
 t/helper/test-reach.c              |   2 +-
 t/helper/test-ref-store.c          |  10 ++
 t/t0041-usage.sh                   |   1 +
 t/t1419-exclude-refs.sh            | 122 +++++++++++++++++
 t/t3402-rebase-merge.sh            |   1 +
 t/t6300-for-each-ref.sh            |  35 +++++
 trace2.h                           |   2 +
 trace2/tr2_ctr.c                   |   5 +
 upload-pack.c                      |  47 +++++--
 26 files changed, 579 insertions(+), 106 deletions(-)
 create mode 100755 t/t1419-exclude-refs.sh

Range-diff against v4:
 1:  c12def5a30a !  1:  5b5ccc40d6b refs.c: rename `ref_filter`
    @@ Commit message
         refs.c: rename `ref_filter`
     
         The refs machinery has its own implementation of a `ref_filter` (used by
    -    `for-each-ref`), which is distinct from the `ref-filler.h` API (also
    +    `for-each-ref`), which is distinct from the `ref-filter.h` API (also
         used by `for-each-ref`, among other things).
     
         Rename the one within refs.c to more clearly indicate its purpose.
 2:  7ce82b6a5a4 !  2:  0c4e995f1d3 ref-filter.h: provide `REF_FILTER_INIT`
    @@ Commit message
         Provide a sane initialization value for `struct ref_filter`, which in a
         subsequent patch will be used to initialize a new field.
     
    -    In the meantime, fix a case in test-reach.c where its `ref_filter` is
    -    not even zero-initialized.
    +    In the meantime, ensure that the `ref_filter` struct used in the
    +    test-helper's `cmd__reach()` is zero-initialized. The lack of
    +    initialization is OK, since `commit_contains()` only looks at the single
    +    `with_commit_tag_algo` field that *is* initialized directly above.
    +
    +    So this does not fix a bug, but rather prevents one from biting us in
    +    the future.
     
         Signed-off-by: Jeff King <peff@peff.net>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
 3:  7e6bf7766d0 !  3:  cea1e88d358 ref-filter: clear reachable list pointers after freeing
    @@ Metadata
      ## Commit message ##
         ref-filter: clear reachable list pointers after freeing
     
    -    In reach_filter(), we pop all commits from the reachable lists, leaving
    -    them empty. But because we're operating on a list pointer that was
    -    passed by value, the original filter.reachable_from pointer is left
    -    dangling.
    +    In `reach_filter()`, we pop all commits from the reachable lists,
    +    leaving them empty. But because we're operating on a list pointer that
    +    was passed by value, the original `filter.reachable_from` and
    +    `filter.unreachable_from` pointers are left dangling.
    +
    +    As is the case with the previous commit, nobody touches either of these
    +    fields after calling `reach_filter()`, so leaving them dangling is OK.
    +    But this future proofs against dangerous situations.
     
         Signed-off-by: Jeff King <peff@peff.net>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
 4:  777e71004d6 =  4:  00a4532d54d ref-filter: add `ref_filter_clear()`
 5:  39e9b0f50d0 !  5:  3ab03ac20df ref-filter.c: parameterize match functions over patterns
    @@ ref-filter.c: static int get_ref_atom_value(struct ref_array_item *ref, int atom
       */
     -static int match_pattern(const struct ref_filter *filter, const char *refname)
     +static int match_pattern(const char **patterns, const char *refname,
    -+			 const int ignore_case)
    ++			 int ignore_case)
      {
     -	const char **patterns = filter->name_patterns;
      	unsigned flags = 0;
    @@ ref-filter.c: static int match_pattern(const struct ref_filter *filter, const ch
       * wildcard (e.g. the same ref matches "refs/heads/m*", too).
       */
     -static int match_name_as_path(const struct ref_filter *filter, const char *refname)
    -+static int match_name_as_path(const struct ref_filter *filter,
    -+			      const char **pattern,
    -+			      const char *refname)
    ++static int match_name_as_path(const char **pattern, const char *refname,
    ++			      int ignore_case)
      {
     -	const char **pattern = filter->name_patterns;
      	int namelen = strlen(refname);
      	unsigned flags = WM_PATHNAME;
      
    +-	if (filter->ignore_case)
    ++	if (ignore_case)
    + 		flags |= WM_CASEFOLD;
    + 
    + 	for (; *pattern; pattern++) {
     @@ ref-filter.c: static int filter_pattern_match(struct ref_filter *filter, const char *refname)
      	if (!*filter->name_patterns)
      		return 1; /* No pattern always matches */
      	if (filter->match_as_path)
     -		return match_name_as_path(filter, refname);
     -	return match_pattern(filter, refname);
    -+		return match_name_as_path(filter, filter->name_patterns, refname);
    ++		return match_name_as_path(filter->name_patterns, refname,
    ++					  filter->ignore_case);
     +	return match_pattern(filter->name_patterns, refname,
     +			     filter->ignore_case);
      }
 6:  c4fd47fd750 !  6:  aa881ca06fa builtin/for-each-ref.c: add `--exclude` option
    @@ ref-filter.c: static int filter_pattern_match(struct ref_filter *filter, const c
     +	if (!filter->exclude.nr)
     +		return 0;
     +	if (filter->match_as_path)
    -+		return match_name_as_path(filter, filter->exclude.v, refname);
    ++		return match_name_as_path(filter->exclude.v, refname,
    ++					  filter->ignore_case);
     +	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
     +}
     +
 7:  e6b50c50219 =  7:  81e223de0c8 refs: plumb `exclude_patterns` argument throughout
 8:  a0990b2916c =  8:  25c099a528c refs/packed-backend.c: refactor `find_reference_location()`
 9:  386ed468fa7 !  9:  af0ce43cc90 refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
    @@ Commit message
     
             $ hyperfine \
               'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
    +          'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' \
               'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
             Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
    -          Time (mean ± σ):     802.7 ms ±   2.1 ms    [User: 691.6 ms, System: 147.0 ms]
    -          Range (min … max):   800.0 ms … 807.7 ms    10 runs
    +          Time (mean ± σ):     798.1 ms ±   3.3 ms    [User: 687.6 ms, System: 146.4 ms]
    +          Range (min … max):   794.5 ms … 805.5 ms    10 runs
     
    -        Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
    -          Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 0.7 ms, System: 4.0 ms]
    -          Range (min … max):     4.3 ms …   6.7 ms    422 runs
    +        Benchmark 2: git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
    +          Time (mean ± σ):      98.9 ms ±   1.4 ms    [User: 93.1 ms, System: 5.7 ms]
    +          Range (min … max):    97.0 ms … 104.0 ms    29 runs
    +
    +        Benchmark 3: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
    +          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 0.7 ms, System: 3.8 ms]
    +          Range (min … max):     4.1 ms …   5.8 ms    524 runs
     
             Summary
               'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
    -          172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'
    +           21.87 ± 1.05 times faster than 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
    +          176.52 ± 8.19 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'
    +
    +    (Comparing stock git and this patch isn't quite fair, since an earlier
    +    commit in this series adds a naive implementation of the `--exclude`
    +    option. `git.prev` is built from the previous commit and includes this
    +    naive implementation).
     
         Using the jump list is fairly straightforward (see the changes to
         `refs/packed-backend.c::next_record()`), but constructing the list is
    @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
      
      /*
     
    + ## refs.h ##
    +@@ refs.h: int for_each_ref(each_ref_fn fn, void *cb_data);
    +  */
    + int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data);
    + 
    ++/*
    ++ * references matching any pattern in "exclude_patterns" are omitted from the
    ++ * result set on a best-effort basis.
    ++ */
    + int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
    + 			     const char **exclude_patterns,
    + 			     each_ref_fn fn, void *cb_data);
    +
      ## refs/packed-backend.c ##
     @@ refs/packed-backend.c: static int cmp_packed_ref_records(const void *v1, const void *v2)
       * Compare a snapshot record at `rec` to the specified NUL-terminated
    @@ refs/packed-backend.c: struct packed_ref_iterator {
     +		iter->jump_cur++;
     +		if (iter->pos < curr->end) {
     +			iter->pos = curr->end;
    ++			/* jumps are coalesced, so only one jump is necessary */
     +			break;
     +		}
     +	}
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +
     +	for (pattern = excluded_patterns; *pattern; pattern++) {
     +		struct jump_list_entry *e;
    ++		const char *start, *end;
     +
     +		/*
     +		 * We can't feed any excludes with globs in them to the
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +		if (has_glob_special(*pattern))
     +			continue;
     +
    ++		start = find_reference_location(snapshot, *pattern, 0);
    ++		end = find_reference_location_end(snapshot, *pattern, 0);
    ++
    ++		if (start == end)
    ++			continue; /* nothing to jump over */
    ++
     +		ALLOC_GROW(iter->jump, iter->jump_nr + 1, iter->jump_alloc);
     +
     +		e = &iter->jump[iter->jump_nr++];
    -+		e->start = find_reference_location(snapshot, *pattern, 0);
    -+		e->end = find_reference_location_end(snapshot, *pattern, 0);
    ++		e->start = start;
    ++		e->end = end;
     +	}
     +
     +	if (!iter->jump_nr) {
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +
     +	for (i = 1, j = 1; i < iter->jump_nr; i++) {
     +		struct jump_list_entry *ours = &iter->jump[i];
    -+
    -+		if (ours->start == ours->end) {
    -+			/* ignore empty regions (no matching entries) */
    -+			continue;
    -+		} else if (ours->start <= last_disjoint->end) {
    ++		if (ours->start <= last_disjoint->end) {
     +			/* overlapping regions extend the previous one */
     +			last_disjoint->end = last_disjoint->end > ours->end
     +				? last_disjoint->end : ours->end;
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +			/* otherwise, insert a new region */
     +			iter->jump[j++] = *ours;
     +			last_disjoint = ours;
    -+
     +		}
     +	}
     +
10:  49c8f5173aa ! 10:  e150941c1d1 refs/packed-backend.c: add trace2 counters for jump list
    @@ Commit message
     
      ## refs/packed-backend.c ##
     @@
    - #include "../chdir-notify.h"
    + #include "../statinfo.h"
      #include "../wrapper.h"
      #include "../write-or-die.h"
     +#include "../trace2.h"
    @@ refs/packed-backend.c: static int next_record(struct packed_ref_iterator *iter)
      		if (iter->pos < curr->end) {
      			iter->pos = curr->end;
     +			trace2_counter_add(TRACE2_COUNTER_ID_PACKED_REFS_JUMPS, 1);
    + 			/* jumps are coalesced, so only one jump is necessary */
      			break;
      		}
    - 	}
     
      ## t/t1419-exclude-refs.sh ##
     @@ t/t1419-exclude-refs.sh: TEST_PASSES_SANITIZE_LEAK=true
11:  dd856a3982b = 11:  a5093d52008 revision.h: store hidden refs in a `strvec`
12:  845904853ee <  -:  ----------- refs/packed-backend.c: ignore complicated hidden refs rules
13:  8d4d7cc22ee ! 12:  7e72c23c8a4 refs.h: let `for_each_namespaced_ref()` take excluded patterns
    @@ refs.h: int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
      
      int head_ref_namespaced(each_ref_fn fn, void *cb_data);
     -int for_each_namespaced_ref(each_ref_fn fn, void *cb_data);
    ++/*
    ++ * references matching any pattern in "exclude_patterns" are omitted from the
    ++ * result set on a best-effort basis.
    ++ */
     +int for_each_namespaced_ref(const char **exclude_patterns,
     +			    each_ref_fn fn, void *cb_data);
      
 -:  ----------- > 13:  f99d89d53b9 refs.h: implement `hidden_refs_to_excludes()`
14:  49c665f9f8f ! 14:  96aada36f9b builtin/receive-pack.c: avoid enumerating hidden references
    @@ builtin/receive-pack.c: static void write_head_info(void)
      
     -	for_each_ref(show_ref_cb, &seen);
     +	refs_for_each_fullref_in(get_main_ref_store(the_repository), "",
    -+				 hidden_refs.v, show_ref_cb, &seen);
    ++				 hidden_refs_to_excludes(&hidden_refs),
    ++				 show_ref_cb, &seen);
      	for_each_alternate_ref(show_one_alternate_ref, &seen);
      	oidset_clear(&seen);
      	if (!sent_capabilities)
15:  19bf4a52d69 ! 15:  8544a647798 upload-pack.c: avoid enumerating hidden refs where possible
    @@ upload-pack.c: static int get_common_commits(struct upload_pack_data *data,
      
     +static int allow_hidden_refs(enum allow_uor allow_uor)
     +{
    -+	return allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1);
    ++	if ((allow_uor & ALLOW_ANY_SHA1) == ALLOW_ANY_SHA1)
    ++		return 1;
    ++	return !(allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
     +}
     +
     +static void for_each_namespaced_ref_1(each_ref_fn fn,
     +				      struct upload_pack_data *data)
     +{
    ++	const char **excludes = NULL;
     +	/*
     +	 * If `data->allow_uor` allows fetching hidden refs, we need to
     +	 * mark all references (including hidden ones), to check in
    @@ upload-pack.c: static int get_common_commits(struct upload_pack_data *data,
     +	 * hidden references.
     +	 */
     +	if (allow_hidden_refs(data->allow_uor))
    -+		for_each_namespaced_ref(NULL, fn, data);
    -+	else
    -+		for_each_namespaced_ref(data->hidden_refs.v, fn, data);
    ++		excludes = hidden_refs_to_excludes(&data->hidden_refs);
    ++
    ++	for_each_namespaced_ref(excludes, fn, data);
     +}
    ++
     +
      static int is_our_ref(struct object *o, enum allow_uor allow_uor)
      {
     -	int allow_hidden_ref = (allow_uor &
     -				(ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
     -	return o->flags & ((allow_hidden_ref ? HIDDEN_REF : 0) | OUR_REF);
    -+	return o->flags & ((allow_hidden_refs(allow_uor) ? HIDDEN_REF : 0) | OUR_REF);
    ++	return o->flags & ((allow_hidden_refs(allow_uor) ? 0 : HIDDEN_REF) | OUR_REF);
      }
      
      /*
16:  ea6cbaf292f ! 16:  dff068c469f ls-refs.c: avoid enumerating hidden refs where possible
    @@ ls-refs.c: int ls_refs(struct repository *r, struct packet_reader *request)
      	refs_for_each_fullref_in_prefixes(get_main_ref_store(r),
      					  get_git_namespace(), data.prefixes.v,
     -					  NULL, send_ref, &data);
    -+					  data.hidden_refs.v, send_ref, &data);
    ++					  hidden_refs_to_excludes(&data.hidden_refs),
    ++					  send_ref, &data);
      	packet_fflush(stdout);
      	strvec_clear(&data.prefixes);
      	strbuf_release(&data.buf);

Comments

Junio C Hamano July 10, 2023, 10:35 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> Here is another reroll of my series to implement jump (née skip) lists
> for the packed refs backend, based on top of the current 'master'.

I've skimmed the whole set again and nothing jumped at me as an
unexpected change.  Let's wait for a few days to see if we have
others comment on the patches and then merge it down to 'next'
unless there is something spotted in there.

Thanks.
Patrick Steinhardt July 11, 2023, 9:37 a.m. UTC | #2
On Mon, Jul 10, 2023 at 03:35:52PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> 
> > Here is another reroll of my series to implement jump (née skip) lists
> > for the packed refs backend, based on top of the current 'master'.
> 
> I've skimmed the whole set again and nothing jumped at me as an
> unexpected change.  Let's wait for a few days to see if we have
> others comment on the patches and then merge it down to 'next'
> unless there is something spotted in there.
> 
> Thanks.

The patch series looks good to me, too, so please feel free to add my
Reviewed-by. Thanks for these exciting changes!

Patrick
Junio C Hamano July 11, 2023, 3:56 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

> On Mon, Jul 10, 2023 at 03:35:52PM -0700, Junio C Hamano wrote:
>> Taylor Blau <me@ttaylorr.com> writes:
>> 
>> > Here is another reroll of my series to implement jump (née skip) lists
>> > for the packed refs backend, based on top of the current 'master'.
>> 
>> I've skimmed the whole set again and nothing jumped at me as an
>> unexpected change.  Let's wait for a few days to see if we have
>> others comment on the patches and then merge it down to 'next'
>> unless there is something spotted in there.
>> 
>> Thanks.
>
> The patch series looks good to me, too, so please feel free to add my
> Reviewed-by. Thanks for these exciting changes!

Thanks.
Taylor Blau July 11, 2023, 5:19 p.m. UTC | #4
On Tue, Jul 11, 2023 at 08:56:57AM -0700, Junio C Hamano wrote:
> >> I've skimmed the whole set again and nothing jumped at me as an
> >> unexpected change.  Let's wait for a few days to see if we have
> >> others comment on the patches and then merge it down to 'next'
> >> unless there is something spotted in there.
> >>
> >> Thanks.
> >
> > The patch series looks good to me, too, so please feel free to add my
> > Reviewed-by. Thanks for these exciting changes!
>
> Thanks.

Thanks, both :-).

Thanks,
Taylor