diff mbox series

[09/15] refs/packed-backend.c: implement skip lists to avoid excluded pattern(s)

Message ID a39d1107c1f3e9fbd859a23aed72e63bbd394fa2.1683581621.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series refs: implement skip lists for packed backend | expand

Commit Message

Taylor Blau May 8, 2023, 10 p.m. UTC
When iterating through the `packed-refs` file in order to answer a query
like:

    $ git for-each-ref --exclude=refs/__hidden__

it would be useful to avoid walking over all of the entries in
`refs/__hidden__/*` when possible, since we know that the ref-filter
code is going to throw them away anyways.

In certain circumstances, doing so is possible. The algorithm for doing
so is as follows:

  - For each excluded pattern, find the first record that matches it,
    and the first pattern that *doesn't* match it (i.e. the location
    you'd next want to consider when excluding that pattern).

  - Sort the patterns by their starting location within the
    `packed-refs` file.

  - Construct a skip list of regions by combining adjacent and
    overlapping regions from the previous step.

  - When iterating through the `packed-refs` file, if `iter->pos` is
    ever contained in one of the regions from the previous steps,
    advance `iter->pos` past the end of that region, and continue
    enumeration.

Note that this optimization is only possible when none of the excluded
pattern(s) have special meta-characters in them. To see why this is the
case, consider the exclusion pattern "refs/foo[a]". In general, in order
to find the location of the first record that matches this pattern, we
could only consider up to the first meta-character, "refs/foo". But this
doesn't work, since the excluded region we'd come up with would include
"refs/foobar", even though it is not excluded.

There are a few other gotchas worth considering. First, note that the
skip list is sorted, so once we skip past a region, we can avoid
considering it (or any regions preceding it) again. The member
`skip_pos` is used to track the first next-possible region to jump
through.

Second, note that the exclusion list is best-effort, since we do not
handle loose references, and because of the meta-character issue above.

In repositories with a large number of hidden references, the speed-up
can be significant. Tests here are done with a copy of linux.git with a
reference "refs/pull/N" pointing at every commit, as in:

    $ git rev-list HEAD | awk '{ print "create refs/pull/" NR " " $0 }' |
        git update-ref --stdin
    $ git pack-refs --all

, it is significantly faster to have `for-each-ref` skip over the
excluded references, as opposed to filtering them out after the fact:

    $ hyperfine \
      'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
    Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
      Time (mean ± σ):     802.7 ms ±   2.1 ms    [User: 691.6 ms, System: 147.0 ms]
      Range (min … max):   800.0 ms … 807.7 ms    10 runs

    Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
      Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 0.7 ms, System: 4.0 ms]
      Range (min … max):     4.3 ms …   6.7 ms    422 runs

    Summary
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
      172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'

Using the skip list is fairly straightforward (see the changes to
`refs/packed-backend.c::next_record()`), but constructing the list is
not. To ensure that the construction is correct, add a new suite of
tests in t1419 covering various corner cases (overlapping regions,
partially overlapping regions, adjacent regions, etc.).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ref-filter.c              |   5 +-
 refs/packed-backend.c     | 150 +++++++++++++++++++++++++++++++++++++-
 t/helper/test-ref-store.c |  10 +++
 t/t1419-exclude-refs.sh   | 101 +++++++++++++++++++++++++
 4 files changed, 263 insertions(+), 3 deletions(-)
 create mode 100755 t/t1419-exclude-refs.sh

Comments

Chris Torek May 9, 2023, 12:10 a.m. UTC | #1
I haven't read through any of this closely (don't have time for it now) but
want to mention one thing here:

On Mon, May 8, 2023 at 3:06 PM Taylor Blau <me@ttaylorr.com> wrote:
>   - Construct a skip list of regions by combining adjacent and
>     overlapping regions from the previous step.

You might want to add a note to the code that there is no
relationship here to the skip list data structure (see
https://en.wikipedia.org/wiki/Skip_list).

Chris
Patrick Steinhardt May 9, 2023, 3:15 p.m. UTC | #2
On Mon, May 08, 2023 at 06:00:08PM -0400, Taylor Blau wrote:
> When iterating through the `packed-refs` file in order to answer a query
> like:
> 
>     $ git for-each-ref --exclude=refs/__hidden__
> 
> it would be useful to avoid walking over all of the entries in
> `refs/__hidden__/*` when possible, since we know that the ref-filter
> code is going to throw them away anyways.
> 
> In certain circumstances, doing so is possible. The algorithm for doing
> so is as follows:
> 
>   - For each excluded pattern, find the first record that matches it,
>     and the first pattern that *doesn't* match it (i.e. the location
>     you'd next want to consider when excluding that pattern).
> 
>   - Sort the patterns by their starting location within the
>     `packed-refs` file.
> 
>   - Construct a skip list of regions by combining adjacent and
>     overlapping regions from the previous step.
> 
>   - When iterating through the `packed-refs` file, if `iter->pos` is
>     ever contained in one of the regions from the previous steps,
>     advance `iter->pos` past the end of that region, and continue
>     enumeration.
> 
> Note that this optimization is only possible when none of the excluded
> pattern(s) have special meta-characters in them. To see why this is the
> case, consider the exclusion pattern "refs/foo[a]". In general, in order
> to find the location of the first record that matches this pattern, we
> could only consider up to the first meta-character, "refs/foo". But this
> doesn't work, since the excluded region we'd come up with would include
> "refs/foobar", even though it is not excluded.

Is this generally true though? A naive implementation would iterate
through all references and find the first reference that matches the
exclusion regular exepression. From thereon we continue to iterate until
we find the first entry that doesn't match. This may cause us to end up
with a suboptimal skip list, but the skip list would still be valid.

As I said, this implementation would be naive as we're now forced to
iterate through all references starting at the beginning. I assume that
your implementation will instead use a binary search to locate the first
entry that matches the exclusion pattern and the last pattern. But the
way this paragraph is formulated makes it sound like this is a general
fact, even though it is a fact that derives from the implementation.

I of course don't propose to change the algorithm here, but instead to
clarify where this restriction actually comes from and why the tradeoff
makes sense.

[snip]
> @@ -929,6 +972,108 @@ static struct ref_iterator_vtable packed_ref_iterator_vtable = {
>  	.abort = packed_ref_iterator_abort
>  };
>  
> +static int skip_list_entry_cmp(const void *va, const void *vb)
> +{
> +	const struct skip_list_entry *a = va;
> +	const struct skip_list_entry *b = vb;
> +
> +	if (a->start < b->start)
> +		return -1;
> +	if (a->start > b->start)
> +		return 1;
> +	return 0;
> +}
> +
> +static int has_glob_special(const char *str)
> +{
> +	const char *p;
> +	for (p = str; *p; p++) {
> +		if (is_glob_special(*p))
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +static const char *ptr_max(const char *x, const char *y)
> +{
> +	if (x > y)
> +		return x;
> +	return y;
> +}
> +
> +static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
> +					struct snapshot *snapshot,
> +					const char **excluded_patterns)
> +{
> +	size_t i, j;
> +	const char **pattern;
> +
> +	if (!excluded_patterns)
> +		return;
> +
> +	for (pattern = excluded_patterns; *pattern; pattern++) {
> +		struct skip_list_entry *e;
> +
> +		/*
> +		 * We can't feed any excludes with globs in them to the
> +		 * refs machinery.  It only understands prefix matching.
> +		 * We likewise can't even feed the string leading up to
> +		 * the first meta-character, as something like "foo[a]"
> +		 * should not exclude "foobar" (but the prefix "foo"
> +		 * would match that and mark it for exclusion).
> +		 */
> +		if (has_glob_special(*pattern))
> +			continue;
> +
> +		ALLOC_GROW(iter->skip, iter->skip_nr + 1, iter->skip_alloc);
> +
> +		e = &iter->skip[iter->skip_nr++];
> +		e->start = find_reference_location(snapshot, *pattern, 0);
> +		e->end = find_reference_location_end(snapshot, *pattern, 0);

One thing that makes this hard to reason about is that we don't
explicitly handle the case where the pattern doesn't match at all. So
you require a bunch of knowledge about what exactly the functions
`find_reference_location()` and `find_reference_location_end()` do in
that case in order to determine whether we will end up doing the right
thing.

Explicitly handling this would give us some benefits:

- It makes the code more obvious.

- We can avoid an extra skip list entry for every non-matching
  pattern.

- We wouldn't have to perform binary search over the snapshot twice.

Might be I'm missing something though.

> +	}
> +
> +	if (!iter->skip_nr) {
> +		/*
> +		 * Every entry in exclude_patterns has a meta-character,
> +		 * nothing to do here.
> +		 */
> +		return;
> +	}
> +
> +	QSORT(iter->skip, iter->skip_nr, skip_list_entry_cmp);
> +
> +	/*
> +	 * As an optimization, merge adjacent entries in the skip list
> +	 * to jump forwards as far as possible when entering a skipped
> +	 * region.
> +	 *
> +	 * For example, if we have two skipped regions:
> +	 *
> +	 *	[[A, B], [B, C]]
> +	 *
> +	 * we want to combine that into a single entry jumping from A to
> +	 * C.
> +	 */
> +	for (i = 1, j = 1; i < iter->skip_nr; i++) {
> +		struct skip_list_entry *ours = &iter->skip[i];
> +		struct skip_list_entry *prev = &iter->skip[i - 1];
> +
> +		if (ours->start == ours->end) {
> +			/* ignore empty regions (no matching entries) */
> +			continue;
> +		} else if (prev->end >= ours->start) {
> +			/* overlapping regions extend the previous one */
> +			prev->end = ptr_max(prev->end, ours->end);
> +		} else {
> +			/* otherwise, insert a new region */
> +			iter->skip[j++] = *ours;
> +		}
> +	}

Mh. Does this do the right thing in case we have multiple consecutive
overlapping skip list entries? We always end up adjusting the immediate
predecessor as we use `i - 1` to find `prev`. Shouldn't we instead start
with `j = 0` and assign `prev = &iter->skip[j]`?

[snip]
> diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
> new file mode 100755
> index 0000000000..da5265a5a8
> --- /dev/null
> +++ b/t/t1419-exclude-refs.sh
> @@ -0,0 +1,101 @@
> +#!/bin/sh
> +
> +test_description='test exclude_patterns functionality in main ref store'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> +. ./test-lib.sh
> +
> +for_each_ref__exclude () {
> +	test-tool ref-store main for-each-ref--exclude "$@" >actual.raw
> +	cut -d ' ' -f 2 actual.raw
> +}
> +
> +for_each_ref () {
> +	git for-each-ref --format='%(refname)' "$@"
> +}
> +
> +test_expect_success 'setup' '
> +	test_commit --no-tag base &&
> +	base="$(git rev-parse HEAD)" &&
> +
> +	for name in foo bar baz quux
> +	do
> +		for i in 1 2 3
> +		do
> +			echo "create refs/heads/$name/$i $base" || return 1
> +		done || return 1
> +	done >in &&
> +	echo "delete refs/heads/main" >>in &&
> +
> +	git update-ref --stdin <in &&
> +	git pack-refs --all
> +'
> +
> +test_expect_success 'for_each_ref__exclude(refs/heads/foo/)' '
> +	# region in middle
> +	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
> +	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
> +
> +	test_cmp expect actual
> +'

Nit: it might be a bit more readable if we put the comment into the test
description instead of having an opaque description that mentions ref
names that don't have much of a meaning without reading the test itself.

Patrick
Taylor Blau May 9, 2023, 8:39 p.m. UTC | #3
On Mon, May 08, 2023 at 05:10:52PM -0700, Chris Torek wrote:
> On Mon, May 8, 2023 at 3:06 PM Taylor Blau <me@ttaylorr.com> wrote:
> >   - Construct a skip list of regions by combining adjacent and
> >     overlapping regions from the previous step.
>
> You might want to add a note to the code that there is no
> relationship here to the skip list data structure (see
> https://en.wikipedia.org/wiki/Skip_list).

Good suggestion, thanks. I picked the name skip list for this concept
since we're skipping over excluded regions of the packed-refs file, but
you're right it has no relation to the skip list data structure.

Will note in the patch message.

Thanks,
Taylor
Taylor Blau May 9, 2023, 8:55 p.m. UTC | #4
On Tue, May 09, 2023 at 05:15:43PM +0200, Patrick Steinhardt wrote:
> On Mon, May 08, 2023 at 06:00:08PM -0400, Taylor Blau wrote:
>
> > Note that this optimization is only possible when none of the excluded
> > pattern(s) have special meta-characters in them. To see why this is the
> > case, consider the exclusion pattern "refs/foo[a]". In general, in order
> > to find the location of the first record that matches this pattern, we
> > could only consider up to the first meta-character, "refs/foo". But this
> > doesn't work, since the excluded region we'd come up with would include
> > "refs/foobar", even though it is not excluded.
>
> Is this generally true though? A naive implementation would iterate
> through all references and find the first reference that matches the
> exclusion regular exepression. From thereon we continue to iterate until
> we find the first entry that doesn't match. This may cause us to end up
> with a suboptimal skip list, but the skip list would still be valid.
>
> As I said, this implementation would be naive as we're now forced to
> iterate through all references starting at the beginning. I assume that
> your implementation will instead use a binary search to locate the first
> entry that matches the exclusion pattern and the last pattern. But the
> way this paragraph is formulated makes it sound like this is a general
> fact, even though it is a fact that derives from the implementation.
>
> I of course don't propose to change the algorithm here, but instead to
> clarify where this restriction actually comes from and why the tradeoff
> makes sense.

In the example you include, it's possible. But consider something like:

    $ git for-each-ref --exclude='refs/foo[ac]'

The region that matches that expression ("refs/fooa", "refs/fooc" and
everything underneath them) does not have to appear as a continuous
single region in the packed-refs file. If you have, say, "refs/foobar",
that will appear between the two regions you want to exclude.

So I think you *might* be able to do it in general, but at the very
least it would involve splitting each character class and finding the
start and end of any region(s) that it matches.

Even so, you'd have to try and match each entry as you determine the
width of the excluded region, at which point you're at par with
enumerating them anyway and having the caller discard any entries it
doesn't want.

> > +	for (pattern = excluded_patterns; *pattern; pattern++) {
> > +		struct skip_list_entry *e;
> > +
> > +		/*
> > +		 * We can't feed any excludes with globs in them to the
> > +		 * refs machinery.  It only understands prefix matching.
> > +		 * We likewise can't even feed the string leading up to
> > +		 * the first meta-character, as something like "foo[a]"
> > +		 * should not exclude "foobar" (but the prefix "foo"
> > +		 * would match that and mark it for exclusion).
> > +		 */
> > +		if (has_glob_special(*pattern))
> > +			continue;
> > +
> > +		ALLOC_GROW(iter->skip, iter->skip_nr + 1, iter->skip_alloc);
> > +
> > +		e = &iter->skip[iter->skip_nr++];
> > +		e->start = find_reference_location(snapshot, *pattern, 0);
> > +		e->end = find_reference_location_end(snapshot, *pattern, 0);
>
> One thing that makes this hard to reason about is that we don't
> explicitly handle the case where the pattern doesn't match at all. So
> you require a bunch of knowledge about what exactly the functions
> `find_reference_location()` and `find_reference_location_end()` do in
> that case in order to determine whether we will end up doing the right
> thing.
>
> Explicitly handling this would give us some benefits:
>
> - It makes the code more obvious.
>
> - We can avoid an extra skip list entry for every non-matching
>   pattern.
>
> - We wouldn't have to perform binary search over the snapshot twice.
>
> Might be I'm missing something though.

We handle it implicitly via the mustexist parameter that both
find_reference_location() and find_reference_location_end() take, which
returns the position that you would find a matching entry if one does
not already exist.

You're right that it would save you a second binary search, but that is
likely to be a vanishingly small cost compared to the actual traversal,
disk I/O, etc.

I could imagine modifying the signature of both of those functions to be
something like:

    int find_reference_location(struct snapshot *, const char *pattern,
                                int mustexist, char *out)

Which would propagate the result through `out`, and the return value
would be whether or not it found a matching entry. But that only really
makes sense when `mustexist` is set to 0, and it adds some verbosity
throughout.

I think that it's totally possible to avoid the second search, but I'm
not sure that the cost of additional complexity and verbosity is worth
the benefit of avoiding one binary search (which will likely be
resident, anyway).

Note though that if e->start == e->end here we will discard the empty
skip region below.

> > +	for (i = 1, j = 1; i < iter->skip_nr; i++) {
> > +		struct skip_list_entry *ours = &iter->skip[i];
> > +		struct skip_list_entry *prev = &iter->skip[i - 1];
> > +
> > +		if (ours->start == ours->end) {
> > +			/* ignore empty regions (no matching entries) */
> > +			continue;
> > +		} else if (prev->end >= ours->start) {
> > +			/* overlapping regions extend the previous one */
> > +			prev->end = ptr_max(prev->end, ours->end);
> > +		} else {
> > +			/* otherwise, insert a new region */
> > +			iter->skip[j++] = *ours;
> > +		}
> > +	}
>
> Mh. Does this do the right thing in case we have multiple consecutive
> overlapping skip list entries? We always end up adjusting the immediate
> predecessor as we use `i - 1` to find `prev`. Shouldn't we instead start
> with `j = 0` and assign `prev = &iter->skip[j]`?

Good catch. I think applying this on top should do the trick:

--- 8< ---
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 137a4233f6..3b1337267a 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1054,9 +1054,9 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
 	 * we want to combine that into a single entry jumping from A to
 	 * C.
 	 */
-	for (i = 1, j = 1; i < iter->skip_nr; i++) {
+	for (i = 0, j = 0; i < iter->skip_nr; i++) {
 		struct skip_list_entry *ours = &iter->skip[i];
-		struct skip_list_entry *prev = &iter->skip[i - 1];
+		struct skip_list_entry *prev = &iter->skip[j];

 		if (ours->start == ours->end) {
 			/* ignore empty regions (no matching entries) */
@@ -1066,7 +1066,7 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
 			prev->end = ptr_max(prev->end, ours->end);
 		} else {
 			/* otherwise, insert a new region */
-			iter->skip[j++] = *ours;
+			iter->skip[++j] = *ours;
 		}
 	}
--- >8 ---

> > +test_expect_success 'for_each_ref__exclude(refs/heads/foo/)' '
> > +	# region in middle
> > +	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
> > +	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
> > +
> > +	test_cmp expect actual
> > +'
>
> Nit: it might be a bit more readable if we put the comment into the test
> description instead of having an opaque description that mentions ref
> names that don't have much of a meaning without reading the test itself.

Fair enough ;-).

Thanks,
Taylor
Taylor Blau May 9, 2023, 9:15 p.m. UTC | #5
On Tue, May 09, 2023 at 04:55:55PM -0400, Taylor Blau wrote:
> Good catch. I think applying this on top should do the trick:
>
> --- 8< ---
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 137a4233f6..3b1337267a 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1054,9 +1054,9 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
>  	 * we want to combine that into a single entry jumping from A to
>  	 * C.
>  	 */
> -	for (i = 1, j = 1; i < iter->skip_nr; i++) {
> +	for (i = 0, j = 0; i < iter->skip_nr; i++) {
>  		struct skip_list_entry *ours = &iter->skip[i];
> -		struct skip_list_entry *prev = &iter->skip[i - 1];
> +		struct skip_list_entry *prev = &iter->skip[j];
>
>  		if (ours->start == ours->end) {
>  			/* ignore empty regions (no matching entries) */
> @@ -1066,7 +1066,7 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
>  			prev->end = ptr_max(prev->end, ours->end);
>  		} else {
>  			/* otherwise, insert a new region */
> -			iter->skip[j++] = *ours;
> +			iter->skip[++j] = *ours;
>  		}
>  	}
> --- >8 ---

Oops, this is wrong. It should be something like this instead:

--- 8< ---
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 137a4233f6..574f32d67f 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1007,6 +1007,7 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
 {
 	size_t i, j;
 	const char **pattern;
+	struct skip_list_entry *last_disjoint;

 	if (!excluded_patterns)
 		return;
@@ -1054,19 +1055,22 @@ static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
 	 * we want to combine that into a single entry jumping from A to
 	 * C.
 	 */
+	last_disjoint = iter->skip;
+
 	for (i = 1, j = 1; i < iter->skip_nr; i++) {
 		struct skip_list_entry *ours = &iter->skip[i];
-		struct skip_list_entry *prev = &iter->skip[i - 1];

 		if (ours->start == ours->end) {
 			/* ignore empty regions (no matching entries) */
 			continue;
-		} else if (prev->end >= ours->start) {
+		} else if (ours->start <= last_disjoint->end) {
 			/* overlapping regions extend the previous one */
-			prev->end = ptr_max(prev->end, ours->end);
+			last_disjoint->end = ptr_max(last_disjoint->end, ours->end);
 		} else {
 			/* otherwise, insert a new region */
 			iter->skip[j++] = *ours;
+			last_disjoint = ours;
+
 		}
 	}
--- >8 ---

Thanks,
Taylor
Junio C Hamano May 9, 2023, 11:40 p.m. UTC | #6
Taylor Blau <me@ttaylorr.com> writes:

> When iterating through the `packed-refs` file in order to answer a query
> like:
>
>     $ git for-each-ref --exclude=refs/__hidden__
>
> it would be useful to avoid walking over all of the entries in
> `refs/__hidden__/*` when possible, since we know that the ref-filter
> code is going to throw them away anyways.
>
> In certain circumstances, doing so is possible. The algorithm for doing
> so is as follows:
>
>   - For each excluded pattern, find the first record that matches it,
>     and the first pattern that *doesn't* match it (i.e. the location
>     you'd next want to consider when excluding that pattern).

Do we find "record" and then "pattern", or is the latter a misspelt
"record"?  I will assume it is the latter while reading the rest.
Is the latter "the record that does not match the pattern, whose
record number is the smallest but yet larger than the first record
that matches the pattern?"  That is we assume that the refs in the
packed refs file are sorted and can be partitioned into three by
each pattern: (1) refs before the first matching record---they do
not match the pattern. (2) refs at or after the first matching
record that match.  (3) refs after (2) that do not match.

I am not sure if that would work.  What if the pattern were refs/tags/[ad-z]*
and there are packed tags refs/tags/{a,b,c,d,e}?  The pattern would partition
the refs into [a](matches), [b,c](does not match), [d,e](matches).

Perhaps I am grossly misunderstanding what the above explanation
says.

>   - Sort the patterns by their starting location within the
>     `packed-refs` file.

So the idea is that the patterns are sorted by the first record they
match, and after sorting these patterns, refs that are between the
beginning of the list of refs and the first record associated with
the first pattern will not match _any_ pattern in that list?

>   - Construct a skip list of regions by combining adjacent and
>     overlapping regions from the previous step.

"skip list of regions" -> "list of regions to skip", I guess.

>   - When iterating through the `packed-refs` file, if `iter->pos` is
>     ever contained in one of the regions from the previous steps,
>     advance `iter->pos` past the end of that region, and continue
>     enumeration.
>
> Note that this optimization is only possible when none of the excluded
> pattern(s) have special meta-characters in them.

Ah, so this is called "patterns" but it deals with literal patterns
without globs?  Then I'll stop worrying about the refs/tags/[ad-z]
counter-example.

> To see why this is the
> case, consider the exclusion pattern "refs/foo[a]". In general, in order
> to find the location of the first record that matches this pattern, we
> could only consider up to the first meta-character, "refs/foo". But this
> doesn't work, since the excluded region we'd come up with would include
> "refs/foobar", even though it is not excluded.

OK.

> Using the skip list is fairly straightforward (see the changes to
> `refs/packed-backend.c::next_record()`), but constructing the list is
> not. To ensure that the construction is correct, add a new suite of
> tests in t1419 covering various corner cases (overlapping regions,
> partially overlapping regions, adjacent regions, etc.).

Sounds good.  Does this actually use the skip list data structure,
or do they happen to share the same two words in their names, but
otherwise have nothing common with each other?  If the latter, we
may want to revise the explanation, data type names, and variable
names, to avoid confusion, as Chris pointed out earlier.

> +static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
> +					struct snapshot *snapshot,
> +					const char **excluded_patterns)
> +{
> +	size_t i, j;
> +	const char **pattern;
> +
> +	if (!excluded_patterns)
> +		return;
> +
> +	for (pattern = excluded_patterns; *pattern; pattern++) {
> +		struct skip_list_entry *e;
> +
> +		/*
> +		 * We can't feed any excludes with globs in them to the
> +		 * refs machinery.  It only understands prefix matching.
> +		 * We likewise can't even feed the string leading up to
> +		 * the first meta-character, as something like "foo[a]"
> +		 * should not exclude "foobar" (but the prefix "foo"
> +		 * would match that and mark it for exclusion).
> +		 */
> +		if (has_glob_special(*pattern))
> +			continue;

Hmph.  I would have expected that a set of patterns with any one
pattern with glob would invalidate the whole skip optimization, but
it is nice if we can salvage such a set and still optimize, if only
for literal patterns.  Interesting.

Ah, that is because we are dealing with ranges that cannot possibly
match.  With the mention of "first record that matches" etc. in the
earlier descriptoin, I somehow misled myself that we are dealing
with ranges that have interesting records.  So, a pattern with glob
does not contribute any range to be skipped, but that is OK.

> +		ALLOC_GROW(iter->skip, iter->skip_nr + 1, iter->skip_alloc);
> +
> +		e = &iter->skip[iter->skip_nr++];
> +		e->start = find_reference_location(snapshot, *pattern, 0);
> +		e->end = find_reference_location_end(snapshot, *pattern, 0);

So, iter->skip[] array has one range per pattern at most, but some
patterns may not contribute any range to the list.

> +	}
> +
> +	if (!iter->skip_nr) {
> +		/*
> +		 * Every entry in exclude_patterns has a meta-character,
> +		 * nothing to do here.
> +		 */
> +		return;
> +	}
> +
> +	QSORT(iter->skip, iter->skip_nr, skip_list_entry_cmp);
> +
> +	/*
> +	 * As an optimization, merge adjacent entries in the skip list
> +	 * to jump forwards as far as possible when entering a skipped
> +	 * region.
> +	 *
> +	 * For example, if we have two skipped regions:
> +	 *
> +	 *	[[A, B], [B, C]]

I am confused.  The first pattern may never match records in [A..B]
range, and the second pattern may never match records in [B..C]
range, but what does it mean to combine these two ranges?

> +	 * we want to combine that into a single entry jumping from A to
> +	 * C.
> +	 */
> +	for (i = 1, j = 1; i < iter->skip_nr; i++) {
> +		struct skip_list_entry *ours = &iter->skip[i];
> +		struct skip_list_entry *prev = &iter->skip[i - 1];


> +		if (ours->start == ours->end) {
> +			/* ignore empty regions (no matching entries) */
> +			continue;
> +		} else if (prev->end >= ours->start) {
> +			/* overlapping regions extend the previous one */
> +			prev->end = ptr_max(prev->end, ours->end);
> +		} else {
> +			/* otherwise, insert a new region */
> +			iter->skip[j++] = *ours;
> +		}

None of the {braces} seem needed, but OK.  

> +	}
> +
> +	iter->skip_nr = j;
> +	iter->skip_pos = 0;
> +}
Taylor Blau May 10, 2023, 2:30 a.m. UTC | #7
On Tue, May 09, 2023 at 04:40:50PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > When iterating through the `packed-refs` file in order to answer a query
> > like:
> >
> >     $ git for-each-ref --exclude=refs/__hidden__
> >
> > it would be useful to avoid walking over all of the entries in
> > `refs/__hidden__/*` when possible, since we know that the ref-filter
> > code is going to throw them away anyways.
> >
> > In certain circumstances, doing so is possible. The algorithm for doing
> > so is as follows:
> >
> >   - For each excluded pattern, find the first record that matches it,
> >     and the first pattern that *doesn't* match it (i.e. the location
> >     you'd next want to consider when excluding that pattern).
>
> Do we find "record" and then "pattern", or is the latter a misspelt
> "record"?  I will assume it is the latter while reading the rest.
> Is the latter "the record that does not match the pattern, whose
> record number is the smallest but yet larger than the first record
> that matches the pattern?"  That is we assume that the refs in the
> packed refs file are sorted and can be partitioned into three by
> each pattern: (1) refs before the first matching record---they do
> not match the pattern. (2) refs at or after the first matching
> record that match.  (3) refs after (2) that do not match.
>
> I am not sure if that would work.  What if the pattern were refs/tags/[ad-z]*
> and there are packed tags refs/tags/{a,b,c,d,e}?  The pattern would partition
> the refs into [a](matches), [b,c](does not match), [d,e](matches).
>
> Perhaps I am grossly misunderstanding what the above explanation
> says.

Sorry, you're right to assume the latter. This should read "..., find
the record that matches it, and the first record that *doesn't* match."

In the example you give, we would ignore that whole pattern and
enumerate everything (including entries that match 'refs/tags/[ad-z]*'),
and the caller is expected to discard them.

> >   - Sort the patterns by their starting location within the
> >     `packed-refs` file.
>
> So the idea is that the patterns are sorted by the first record they
> match, and after sorting these patterns, refs that are between the
> beginning of the list of refs and the first record associated with
> the first pattern will not match _any_ pattern in that list?

My patch could use some clarification here, since it is much easier to
treat the patterns as unsorted, and then sort the beginning of the range
that they match.

> >   - Construct a skip list of regions by combining adjacent and
> >     overlapping regions from the previous step.
>
> "skip list of regions" -> "list of regions to skip", I guess.

Thanks, will update.

> > Using the skip list is fairly straightforward (see the changes to
> > `refs/packed-backend.c::next_record()`), but constructing the list is
> > not. To ensure that the construction is correct, add a new suite of
> > tests in t1419 covering various corner cases (overlapping regions,
> > partially overlapping regions, adjacent regions, etc.).
>
> Sounds good.  Does this actually use the skip list data structure,
> or do they happen to share the same two words in their names, but
> otherwise have nothing common with each other?  If the latter, we
> may want to revise the explanation, data type names, and variable
> names, to avoid confusion, as Chris pointed out earlier.

They have nothing to do with each other ;-). I made a note in this patch
in the revised version I'll send in the next day or two to note the
distinction. But I'm fine with renaming the whole concept to "jump list"
or something like that if you prefer.

> > +	for (pattern = excluded_patterns; *pattern; pattern++) {
> > +		struct skip_list_entry *e;
> > +
> > +		/*
> > +		 * We can't feed any excludes with globs in them to the
> > +		 * refs machinery.  It only understands prefix matching.
> > +		 * We likewise can't even feed the string leading up to
> > +		 * the first meta-character, as something like "foo[a]"
> > +		 * should not exclude "foobar" (but the prefix "foo"
> > +		 * would match that and mark it for exclusion).
> > +		 */
> > +		if (has_glob_special(*pattern))
> > +			continue;
>
> Hmph.  I would have expected that a set of patterns with any one
> pattern with glob would invalidate the whole skip optimization, but
> it is nice if we can salvage such a set and still optimize, if only
> for literal patterns.  Interesting.
>
> Ah, that is because we are dealing with ranges that cannot possibly
> match.  With the mention of "first record that matches" etc. in the
> earlier descriptoin, I somehow misled myself that we are dealing
> with ranges that have interesting records.  So, a pattern with glob
> does not contribute any range to be skipped, but that is OK.

Exactly right. This is all "best effort" anyway, since there are some
patterns that we cannot construct a skip list entry for in the general
case. So it's fine if we enumerate some references that match one or
more of the excluded patterns, because the caller is expected to drop
those results themselves.

> > +	/*
> > +	 * As an optimization, merge adjacent entries in the skip list
> > +	 * to jump forwards as far as possible when entering a skipped
> > +	 * region.
> > +	 *
> > +	 * For example, if we have two skipped regions:
> > +	 *
> > +	 *	[[A, B], [B, C]]
>
> I am confused.  The first pattern may never match records in [A..B]
> range, and the second pattern may never match records in [B..C]
> range, but what does it mean to combine these two ranges?

The patterns would match all records in their respective regions, but
since they are excluded we want to jump over those references instead of
iterating (and then discarding them later).

So if we have a jump from A->B, and another from B->C, it would be fine
to perform two jumps to get from A to C. But we can detect this case by
combining adjacent/overlapping regions.

> > +		if (ours->start == ours->end) {
> > +			/* ignore empty regions (no matching entries) */
> > +			continue;
> > +		} else if (prev->end >= ours->start) {
> > +			/* overlapping regions extend the previous one */
> > +			prev->end = ptr_max(prev->end, ours->end);
> > +		} else {
> > +			/* otherwise, insert a new region */
> > +			iter->skip[j++] = *ours;
> > +		}
>
> None of the {braces} seem needed, but OK.

Yeah. I added the braces here intentionally to match the
CodingGuidelines, which state that this case is an exception.

Thanks,
Taylor
Patrick Steinhardt May 10, 2023, 7:25 a.m. UTC | #8
On Tue, May 09, 2023 at 04:55:55PM -0400, Taylor Blau wrote:
> On Tue, May 09, 2023 at 05:15:43PM +0200, Patrick Steinhardt wrote:
> > On Mon, May 08, 2023 at 06:00:08PM -0400, Taylor Blau wrote:
> >
> > > Note that this optimization is only possible when none of the excluded
> > > pattern(s) have special meta-characters in them. To see why this is the
> > > case, consider the exclusion pattern "refs/foo[a]". In general, in order
> > > to find the location of the first record that matches this pattern, we
> > > could only consider up to the first meta-character, "refs/foo". But this
> > > doesn't work, since the excluded region we'd come up with would include
> > > "refs/foobar", even though it is not excluded.
> >
> > Is this generally true though? A naive implementation would iterate
> > through all references and find the first reference that matches the
> > exclusion regular exepression. From thereon we continue to iterate until
> > we find the first entry that doesn't match. This may cause us to end up
> > with a suboptimal skip list, but the skip list would still be valid.
> >
> > As I said, this implementation would be naive as we're now forced to
> > iterate through all references starting at the beginning. I assume that
> > your implementation will instead use a binary search to locate the first
> > entry that matches the exclusion pattern and the last pattern. But the
> > way this paragraph is formulated makes it sound like this is a general
> > fact, even though it is a fact that derives from the implementation.
> >
> > I of course don't propose to change the algorithm here, but instead to
> > clarify where this restriction actually comes from and why the tradeoff
> > makes sense.
> 
> In the example you include, it's possible. But consider something like:
> 
>     $ git for-each-ref --exclude='refs/foo[ac]'
> 
> The region that matches that expression ("refs/fooa", "refs/fooc" and
> everything underneath them) does not have to appear as a continuous
> single region in the packed-refs file. If you have, say, "refs/foobar",
> that will appear between the two regions you want to exclude.
> 
> So I think you *might* be able to do it in general, but at the very
> least it would involve splitting each character class and finding the
> start and end of any region(s) that it matches.
> 
> Even so, you'd have to try and match each entry as you determine the
> width of the excluded region, at which point you're at par with
> enumerating them anyway and having the caller discard any entries it
> doesn't want.

Alternatively you could also do this on a best-effort basis and only
find the first matching region. But anyway, as said: I'm fine with the
limitations but think that we should document better where they come
from. The current commit message sounds like the limitation is of
general nature even though it is in fact a conciously-chosen tradeoff
that allows us to make the implementation more efficient for most cases.

Patrick
diff mbox series

Patch

diff --git a/ref-filter.c b/ref-filter.c
index c8ced1104b..56ebd332fa 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2208,12 +2208,13 @@  static int for_each_fullref_in_pattern(struct ref_filter *filter,
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
 		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
-						 "", NULL, cb, cb_data);
+						 "", filter->exclude.v, cb, cb_data);
 	}
 
 	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
-						 NULL, cb, cb_data);
+						 filter->exclude.v,
+						 cb, cb_data);
 }
 
 /*
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 98f96bf3ee..137a4233f6 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -595,6 +595,21 @@  static const char *find_reference_location(struct snapshot *snapshot,
 	return find_reference_location_1(snapshot, refname, mustexist, 1);
 }
 
+/*
+ * Find the place in `snapshot->buf` after the end of the record for
+ * `refname`. In other words, find the location of first thing *after*
+ * `refname`.
+ *
+ * Other semantics are identical to the ones in
+ * `find_reference_location()`.
+ */
+static const char *find_reference_location_end(struct snapshot *snapshot,
+					       const char *refname,
+					       int mustexist)
+{
+	return find_reference_location_1(snapshot, refname, mustexist, 0);
+}
+
 /*
  * Create a newly-allocated `snapshot` of the `packed-refs` file in
  * its current state and return it. The return value will already have
@@ -786,6 +801,13 @@  struct packed_ref_iterator {
 	/* The end of the part of the buffer that will be iterated over: */
 	const char *eof;
 
+	struct skip_list_entry {
+		const char *start;
+		const char *end;
+	} *skip;
+	size_t skip_nr, skip_alloc;
+	size_t skip_pos;
+
 	/* Scratch space for current values: */
 	struct object_id oid, peeled;
 	struct strbuf refname_buf;
@@ -803,14 +825,34 @@  struct packed_ref_iterator {
  */
 static int next_record(struct packed_ref_iterator *iter)
 {
-	const char *p = iter->pos, *eol;
+	const char *p, *eol;
 
 	strbuf_reset(&iter->refname_buf);
 
+	/*
+	 * If iter->pos is contained within a skipped region, jump past
+	 * it.
+	 *
+	 * Note that each skipped region is considered at most once,
+	 * since they are ordered based on their starting position.
+	 */
+	while (iter->skip_pos < iter->skip_nr) {
+		struct skip_list_entry *curr = &iter->skip[iter->skip_pos];
+		if (iter->pos < curr->start)
+			break; /* not to the next jump yet */
+
+		iter->skip_pos++;
+		if (iter->pos < curr->end) {
+			iter->pos = curr->end;
+			break;
+		}
+	}
+
 	if (iter->pos == iter->eof)
 		return ITER_DONE;
 
 	iter->base.flags = REF_ISPACKED;
+	p = iter->pos;
 
 	if (iter->eof - p < the_hash_algo->hexsz + 2 ||
 	    parse_oid_hex(p, &iter->oid, &p) ||
@@ -918,6 +960,7 @@  static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
 	int ok = ITER_DONE;
 
 	strbuf_release(&iter->refname_buf);
+	free(iter->skip);
 	release_snapshot(iter->snapshot);
 	base_ref_iterator_free(ref_iterator);
 	return ok;
@@ -929,6 +972,108 @@  static struct ref_iterator_vtable packed_ref_iterator_vtable = {
 	.abort = packed_ref_iterator_abort
 };
 
+static int skip_list_entry_cmp(const void *va, const void *vb)
+{
+	const struct skip_list_entry *a = va;
+	const struct skip_list_entry *b = vb;
+
+	if (a->start < b->start)
+		return -1;
+	if (a->start > b->start)
+		return 1;
+	return 0;
+}
+
+static int has_glob_special(const char *str)
+{
+	const char *p;
+	for (p = str; *p; p++) {
+		if (is_glob_special(*p))
+			return 1;
+	}
+	return 0;
+}
+
+static const char *ptr_max(const char *x, const char *y)
+{
+	if (x > y)
+		return x;
+	return y;
+}
+
+static void populate_excluded_skip_list(struct packed_ref_iterator *iter,
+					struct snapshot *snapshot,
+					const char **excluded_patterns)
+{
+	size_t i, j;
+	const char **pattern;
+
+	if (!excluded_patterns)
+		return;
+
+	for (pattern = excluded_patterns; *pattern; pattern++) {
+		struct skip_list_entry *e;
+
+		/*
+		 * We can't feed any excludes with globs in them to the
+		 * refs machinery.  It only understands prefix matching.
+		 * We likewise can't even feed the string leading up to
+		 * the first meta-character, as something like "foo[a]"
+		 * should not exclude "foobar" (but the prefix "foo"
+		 * would match that and mark it for exclusion).
+		 */
+		if (has_glob_special(*pattern))
+			continue;
+
+		ALLOC_GROW(iter->skip, iter->skip_nr + 1, iter->skip_alloc);
+
+		e = &iter->skip[iter->skip_nr++];
+		e->start = find_reference_location(snapshot, *pattern, 0);
+		e->end = find_reference_location_end(snapshot, *pattern, 0);
+	}
+
+	if (!iter->skip_nr) {
+		/*
+		 * Every entry in exclude_patterns has a meta-character,
+		 * nothing to do here.
+		 */
+		return;
+	}
+
+	QSORT(iter->skip, iter->skip_nr, skip_list_entry_cmp);
+
+	/*
+	 * As an optimization, merge adjacent entries in the skip list
+	 * to jump forwards as far as possible when entering a skipped
+	 * region.
+	 *
+	 * For example, if we have two skipped regions:
+	 *
+	 *	[[A, B], [B, C]]
+	 *
+	 * we want to combine that into a single entry jumping from A to
+	 * C.
+	 */
+	for (i = 1, j = 1; i < iter->skip_nr; i++) {
+		struct skip_list_entry *ours = &iter->skip[i];
+		struct skip_list_entry *prev = &iter->skip[i - 1];
+
+		if (ours->start == ours->end) {
+			/* ignore empty regions (no matching entries) */
+			continue;
+		} else if (prev->end >= ours->start) {
+			/* overlapping regions extend the previous one */
+			prev->end = ptr_max(prev->end, ours->end);
+		} else {
+			/* otherwise, insert a new region */
+			iter->skip[j++] = *ours;
+		}
+	}
+
+	iter->skip_nr = j;
+	iter->skip_pos = 0;
+}
+
 static struct ref_iterator *packed_ref_iterator_begin(
 		struct ref_store *ref_store,
 		const char *prefix, const char **exclude_patterns,
@@ -964,6 +1109,9 @@  static struct ref_iterator *packed_ref_iterator_begin(
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
 
+	if (exclude_patterns)
+		populate_excluded_skip_list(iter, snapshot, exclude_patterns);
+
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
 
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index 6d8f844e9c..2bff003f7c 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -175,6 +175,15 @@  static int cmd_for_each_ref(struct ref_store *refs, const char **argv)
 	return refs_for_each_ref_in(refs, prefix, each_ref, NULL);
 }
 
+static int cmd_for_each_ref__exclude(struct ref_store *refs, const char **argv)
+{
+	const char *prefix = notnull(*argv++, "prefix");
+	const char **exclude_patterns = argv;
+
+	return refs_for_each_fullref_in(refs, prefix, exclude_patterns, each_ref,
+					NULL);
+}
+
 static int cmd_resolve_ref(struct ref_store *refs, const char **argv)
 {
 	struct object_id oid = *null_oid();
@@ -307,6 +316,7 @@  static struct command commands[] = {
 	{ "delete-refs", cmd_delete_refs },
 	{ "rename-ref", cmd_rename_ref },
 	{ "for-each-ref", cmd_for_each_ref },
+	{ "for-each-ref--exclude", cmd_for_each_ref__exclude },
 	{ "resolve-ref", cmd_resolve_ref },
 	{ "verify-ref", cmd_verify_ref },
 	{ "for-each-reflog", cmd_for_each_reflog },
diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
new file mode 100755
index 0000000000..da5265a5a8
--- /dev/null
+++ b/t/t1419-exclude-refs.sh
@@ -0,0 +1,101 @@ 
+#!/bin/sh
+
+test_description='test exclude_patterns functionality in main ref store'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+for_each_ref__exclude () {
+	test-tool ref-store main for-each-ref--exclude "$@" >actual.raw
+	cut -d ' ' -f 2 actual.raw
+}
+
+for_each_ref () {
+	git for-each-ref --format='%(refname)' "$@"
+}
+
+test_expect_success 'setup' '
+	test_commit --no-tag base &&
+	base="$(git rev-parse HEAD)" &&
+
+	for name in foo bar baz quux
+	do
+		for i in 1 2 3
+		do
+			echo "create refs/heads/$name/$i $base" || return 1
+		done || return 1
+	done >in &&
+	echo "delete refs/heads/main" >>in &&
+
+	git update-ref --stdin <in &&
+	git pack-refs --all
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/foo/)' '
+	# region in middle
+	for_each_ref__exclude refs/heads refs/heads/foo >actual &&
+	for_each_ref refs/heads/bar refs/heads/baz refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/bar/)' '
+	# region at beginning
+	for_each_ref__exclude refs/heads refs/heads/bar >actual &&
+	for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/quux/)' '
+	# region at end
+	for_each_ref__exclude refs/heads refs/heads/quux >actual &&
+	for_each_ref refs/heads/foo refs/heads/bar refs/heads/baz >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/bar/, refs/heads/quux/)' '
+	# disjoint regions
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/quux >actual &&
+	for_each_ref refs/heads/baz refs/heads/foo >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/bar/, refs/heads/baz/)' '
+	# adjacent, non-overlapping regions
+	for_each_ref__exclude refs/heads refs/heads/bar refs/heads/baz >actual &&
+	for_each_ref refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/ba refs/heads/baz/)' '
+	# overlapping region
+	for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual &&
+	for_each_ref refs/heads/foo refs/heads/quux >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/does/not/exist)' '
+	# empty region
+	for_each_ref__exclude refs/heads refs/heads/does/not/exist >actual &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'for_each_ref__exclude(refs/heads/ba*)' '
+	# discards meta-characters
+	for_each_ref__exclude refs/heads "refs/heads/ba*" >actual &&
+	for_each_ref >expect &&
+
+	test_cmp expect actual
+'
+
+test_done