diff mbox series

[v4,07/14] bloom: split 'get_bloom_filter()' in two

Message ID ba89a0cb837abc5fadbaa9514169636d85ee50cf.1599172908.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series more miscellaneous Bloom filter improvements | expand

Commit Message

Taylor Blau Sept. 3, 2020, 10:46 p.m. UTC
'get_bloom_filter' takes a flag to control whether it will compute a
Bloom filter if the requested one is missing. In the next patch, we'll
add yet another parameter to this method, which would force all but one
caller to specify an extra 'NULL' parameter at the end.

Instead of doing this, split 'get_bloom_filter' into two functions:
'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only
looks up a Bloom filter (and does not compute one if it's missing,
thus dropping the 'compute_if_not_present' flag). The latter does
compute missing Bloom filters, with an additional parameter to store
whether or not it needed to do so.

This simplifies many call-sites, since the majority of existing callers
to 'get_bloom_filter' do not want missing Bloom filters to be computed
(so they can drop the parameter entirely and use the simpler version of
the function).

While we're at it, instrument the new 'get_or_compute_bloom_filter()'
with two counters in the 'write_commit_graph_context' struct which store
the number of filters that we computed, and the number of those which
were too large to store.

It would be nice to drop the 'compute_if_not_present' flag entirely,
since all remaining callers of 'get_or_compute_bloom_filter' pass it as
'1', but this will change in a future patch and hence cannot be removed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 blame.c               |  2 +-
 bloom.c               | 13 ++++++++++---
 bloom.h               | 10 +++++++---
 commit-graph.c        | 38 +++++++++++++++++++++++++++++++++++---
 line-log.c            |  2 +-
 revision.c            |  2 +-
 t/helper/test-bloom.c |  3 ++-
 7 files changed, 57 insertions(+), 13 deletions(-)

Comments

Jakub Narębski Sept. 5, 2020, 5:22 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

[...]
> While we're at it, instrument the new 'get_or_compute_bloom_filter()'
> with two counters in the 'write_commit_graph_context' struct which store
> the number of filters that we computed, and the number of those which
> were too large to store.

[...]
> @@ -1414,12 +1433,25 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
>  		QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp);
>  
>  	for (i = 0; i < ctx->commits.nr; i++) {
> +		int computed = 0;
>  		struct commit *c = sorted_commits[i];
> -		struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1);
> +		struct bloom_filter *filter = get_or_compute_bloom_filter(
> +			ctx->r,
> +			c,
> +			1,
> +			&computed);
> +		if (computed) {
> +			ctx->count_bloom_filter_computed++;
> +			if (filter && !filter->len)
> +				ctx->count_bloom_filter_found_large++;

How do you distinguish between no changed paths stored because there
were no changes (which should not count as *_found_large), and no
changed paths stored because there were too many changes?  If I remember
it correctly in current implemetation both are represented as
zero-length filter (no changed paths could have been represented as all
zeros filter, too many changed paths could have been represented as all
ones filter).

No changes to store in filter can happen not only with `--allow-empty`
(e.g. via interactive rebase), but also with merge where all changes
came from the second parent -- we are storing only changes to first
parent, if I remember it correctly.

This is a minor issue, though.

> +		}
>  		ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len;
>  		display_progress(progress, i + 1);
>  	}
>  
> +	if (trace2_is_enabled())
> +		trace2_bloom_filter_write_statistics(ctx);
> +
>  	free(sorted_commits);
>  	stop_progress(&progress);
>  }

Best,
Taylor Blau Sept. 5, 2020, 5:38 p.m. UTC | #2
On Sat, Sep 05, 2020 at 07:22:08PM +0200, Jakub Narębski wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> How do you distinguish between no changed paths stored because there
> were no changes (which should not count as *_found_large), and no
> changed paths stored because there were too many changes?  If I remember
> it correctly in current implemetation both are represented as
> zero-length filter (no changed paths could have been represented as all
> zeros filter, too many changed paths could have been represented as all
> ones filter).

Right, once we get handed back a filter from
'get_or_compute_bloom_filter()', we can't distinguish between (a) a
commit with too many changes to store in a single Bloom filter, and (b)
a commit with no changes at all.

It's unfortunate that callers can't pick between the two, but this
implementation is actually an improvement on the status-quo! Why?
Because right now we'll see an "empty" Bloom filter and recompute it
because it's "missing", only to discover that it has no changes.

With this patch, we'll say "this filter looks too large", and stop
computing it, because we have already gone through the effort to compute
it once (and marked it in the BFXL chunk).

Now, you could certainly argue that 'BFXL' could be called 'BFWC'
("Bloom filter was computed"), and then the "was this filter too large"
check means that the commit was (a) marked in a BFWC chunk, and (b) has
length zero. I'm not necessarily opposed, but I think that this is
probably not worth it, since we're trying to disambiguate something that
is inherently ambiguous. (That is, even with this new check, a
length-zero would still be thought to be "too large", since it was
computed, and has length 0).

> No changes to store in filter can happen not only with `--allow-empty`
> (e.g. via interactive rebase), but also with merge where all changes
> came from the second parent -- we are storing only changes to first
> parent, if I remember it correctly.

Agreed. And yes, Bloom filters store changes only to their commit's
first parent.

> This is a minor issue, though.

Thanks for raising it. I don't think that this is a show-stopper for
this series.

Thanks,
Taylor
Jakub Narębski Sept. 5, 2020, 5:50 p.m. UTC | #3
Hello,

On Sat, 5 Sep 2020 at 19:38, Taylor Blau <me@ttaylorr.com> wrote:
> On Sat, Sep 05, 2020 at 07:22:08PM +0200, Jakub Narębski wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > How do you distinguish between no changed paths stored because there
> > were no changes (which should not count as *_found_large), and no
> > changed paths stored because there were too many changes?  If I remember
> > it correctly in current implementation both are represented as
> > zero-length filter (no changed paths could have been represented as all
> > zeros filter, too many changed paths could have been represented as all
> > ones filter).
>
> Right, once we get handed back a filter from
> 'get_or_compute_bloom_filter()', we can't distinguish between (a) a
> commit with too many changes to store in a single Bloom filter, and (b)
> a commit with no changes at all.

We could change how we store either no-changes Bloom filter (as all
zeros minimal size filter), or too-many-changes Bloom filter (as all
ones, i.e. max unsigned value, minimal size filter). This change would
not require to change any user of Bloom filter.

> It's unfortunate that callers can't pick between the two, but this
> implementation is actually an improvement on the status-quo! Why?
> Because right now we'll see an "empty" Bloom filter and recompute it
> because it's "missing", only to discover that it has no changes.
>
> With this patch, we'll say "this filter looks too large", and stop
> computing it, because we have already gone through the effort to compute
> it once (and marked it in the BFXL chunk).

Can we use this when computing trace2 values?

[...]
> > This is a minor issue, though.
>
> Thanks for raising it. I don't think that this is a show-stopper for
> this series.

I agree.

Best,
Taylor Blau Sept. 5, 2020, 6:01 p.m. UTC | #4
On Sat, Sep 05, 2020 at 07:50:01PM +0200, Jakub Narębski wrote:
> Hello,
>
> On Sat, 5 Sep 2020 at 19:38, Taylor Blau <me@ttaylorr.com> wrote:
> > Right, once we get handed back a filter from
> > 'get_or_compute_bloom_filter()', we can't distinguish between (a) a
> > commit with too many changes to store in a single Bloom filter, and (b)
> > a commit with no changes at all.
>
> We could change how we store either no-changes Bloom filter (as all
> zeros minimal size filter), or too-many-changes Bloom filter (as all
> ones, i.e. max unsigned value, minimal size filter). This change would
> not require to change any user of Bloom filter.

I don't think that's true. Say that we changed the empty Bloom filter to
be encoded as only having the most-significant bit set. First, we'd have
to write a Bloom filter where we didn't have to before. But the real
issue is that commit-graph files generated with new clients would
suddenly be unreadable by old clients.
>
> > It's unfortunate that callers can't pick between the two, but this
> > implementation is actually an improvement on the status-quo! Why?
> > Because right now we'll see an "empty" Bloom filter and recompute it
> > because it's "missing", only to discover that it has no changes.
> >
> > With this patch, we'll say "this filter looks too large", and stop
> > computing it, because we have already gone through the effort to compute
> > it once (and marked it in the BFXL chunk).
>
> Can we use this when computing trace2 values?

We could, but I don't think it's absolutely necessary. The test coverage
in t4216 gives us enough confidence already.

> [...]
> > > This is a minor issue, though.
> >
> > Thanks for raising it. I don't think that this is a show-stopper for
> > this series.
>
> I agree.

Thanks for your input!

> Best,
> --
> Jakub Narębski

Thanks,
Taylor
Jakub Narębski Sept. 5, 2020, 6:18 p.m. UTC | #5
Hello,

On Sat, 5 Sep 2020 at 20:01, Taylor Blau <me@ttaylorr.com> wrote:
> On Sat, Sep 05, 2020 at 07:50:01PM +0200, Jakub Narębski wrote:
> > On Sat, 5 Sep 2020 at 19:38, Taylor Blau <me@ttaylorr.com> wrote:
> > >
> > > Right, once we get handed back a filter from
> > > 'get_or_compute_bloom_filter()', we can't distinguish between (a) a
> > > commit with too many changes to store in a single Bloom filter, and (b)
> > > a commit with no changes at all.
> >
> > We could change how we store either no-changes Bloom filter (as all
> > zeros minimal size filter), or too-many-changes Bloom filter (as all
> > ones, i.e. max unsigned value, minimal size filter). This change would
> > not require to change any user of Bloom filter.
>
> I don't think that's true. Say that we changed the empty Bloom filter to
> be encoded as only having the most-significant bit set. First, we'd have
> to write a Bloom filter where we didn't have to before.

That's true.

>                                                                                But the real
> issue is that commit-graph files generated with new clients would
> suddenly be unreadable by old clients.

Actually it is, at least in the form that I have proposed. The Bloom filter
which has all bits set to zero would for every possible path reply that
the path is not in set. Old clients would therefore work without changes.
Therefore this is good representation of no-changes Bloom filter.

The Bloom filter which has all bits set to one would for every possible
path reply that the path is maybe in set. This is a good alternative
representation of too-many-changes Bloom filter. Again, old clients
would work without changes.

[...]

Best,
Taylor Blau Sept. 5, 2020, 6:38 p.m. UTC | #6
On Sat, Sep 05, 2020 at 08:18:40PM +0200, Jakub Narębski wrote:
> Hello,
>
> On Sat, 5 Sep 2020 at 20:01, Taylor Blau <me@ttaylorr.com> wrote:
> > But the real issue is that commit-graph files generated with new
> > clients would suddenly be unreadable by old clients.
>
> Actually it is, at least in the form that I have proposed. The Bloom filter
> which has all bits set to zero would for every possible path reply that
> the path is not in set. Old clients would therefore work without changes.
> Therefore this is good representation of no-changes Bloom filter.
>
> The Bloom filter which has all bits set to one would for every possible
> path reply that the path is maybe in set. This is a good alternative
> representation of too-many-changes Bloom filter. Again, old clients
> would work without changes.

That's a very interesting thought. To be honest, I'm not crazy about
generating our own Bloom filters that have special meaning (i.e., that
even though they are interpreted the same way as any other filter, they
are specially-crafted to carry a certain meaning). Your proposal also
means that commit-graph files are going to get bigger, which is
something that you may or may not care about.

On the other hand, it does get rid of the BFXL chunk, which certainly
isn't the most elegant thing.

I don't know. I think my biggest objection is the size: we use the BIDX
chunk today to avoid having to write the length-zero Bloom filters; your
scheme would force us to write every filter. On the other hand, we could
continue to avoid writing length-zero filters, so long as the
commit-graph indicates that it knows this optimization.

Stolee: what do you think?

> Best,
> --
> Jakub Narębski

Thanks,
Taylor
Taylor Blau Sept. 5, 2020, 6:55 p.m. UTC | #7
On Sat, Sep 05, 2020 at 02:38:54PM -0400, Taylor Blau wrote:
> I don't know. I think my biggest objection is the size: we use the BIDX
> chunk today to avoid having to write the length-zero Bloom filters; your
> scheme would force us to write every filter. On the other hand, we could
> continue to avoid writing length-zero filters, so long as the
> commit-graph indicates that it knows this optimization.

Thinking about it a little bit more, I'm pretty sure that this isn't as
easy as it sounds. Say that we:

  - continued to encode length-zero Bloom filters as equal adjacent
    entries in the BIDX, but reserve the length-zero filter for commits
    with no changed-paths, _or_ commits whose Bloom filters have not yet
    been computed

  - write "too large" Bloom filters (i.e., commits with >= 512 changed
    paths in a diff to their first parent) as a non-empty Bloom filter
    with all bits set high.

I think we're still no better off today than before, because of the
overloading in the length-zero Bloom filter. Because we would treat
empty filters the same as ones that haven't been computed, we would
recompute empty filters, and that would count against our
'--max-new-filters' budget.

I don't see a non-convoluted way to split the overloaded length-zero
case into something that is distinguishable without a format extension.
By the way, I think that your idea is good, and that it would be
workable without the existing structure of the BIDX chunk (which itself
made sense at the time that it was written).

So, I really want your idea to work. But, I think that ultimately the
BFXL chunk is a more straightforward path forward.


Thanks,
Taylor
SZEDER Gábor Sept. 5, 2020, 7:04 p.m. UTC | #8
On Sat, Sep 05, 2020 at 02:55:34PM -0400, Taylor Blau wrote:
> On Sat, Sep 05, 2020 at 02:38:54PM -0400, Taylor Blau wrote:
> > I don't know. I think my biggest objection is the size: we use the BIDX
> > chunk today to avoid having to write the length-zero Bloom filters; your
> > scheme would force us to write every filter. On the other hand, we could
> > continue to avoid writing length-zero filters, so long as the
> > commit-graph indicates that it knows this optimization.
> 
> Thinking about it a little bit more, I'm pretty sure that this isn't as
> easy as it sounds. Say that we:
> 
>   - continued to encode length-zero Bloom filters as equal adjacent
>     entries in the BIDX, but reserve the length-zero filter for commits
>     with no changed-paths, _or_ commits whose Bloom filters have not yet
>     been computed

No, use zero-length filters for commits whose Bloom filters have not
yet been computed, and use a one-byte all zero bits Bloom filter for
commits with no modified paths.

And this is exactly what I proposed earlier.

>   - write "too large" Bloom filters (i.e., commits with >= 512 changed
>     paths in a diff to their first parent) as a non-empty Bloom filter
>     with all bits set high.
> 
> I think we're still no better off today than before, because of the
> overloading in the length-zero Bloom filter. Because we would treat
> empty filters the same as ones that haven't been computed, we would
> recompute empty filters, and that would count against our
> '--max-new-filters' budget.
> 
> I don't see a non-convoluted way to split the overloaded length-zero
> case into something that is distinguishable without a format extension.

See above, no format extension needed.

> By the way, I think that your idea is good, and that it would be
> workable without the existing structure of the BIDX chunk (which itself
> made sense at the time that it was written).
> 
> So, I really want your idea to work. But, I think that ultimately the
> BFXL chunk is a more straightforward path forward.
> 
> 
> Thanks,
> Taylor
Taylor Blau Sept. 5, 2020, 7:49 p.m. UTC | #9
On Sat, Sep 05, 2020 at 09:04:50PM +0200, SZEDER Gábor wrote:
> On Sat, Sep 05, 2020 at 02:55:34PM -0400, Taylor Blau wrote:
> > On Sat, Sep 05, 2020 at 02:38:54PM -0400, Taylor Blau wrote:
> > > I don't know. I think my biggest objection is the size: we use the BIDX
> > > chunk today to avoid having to write the length-zero Bloom filters; your
> > > scheme would force us to write every filter. On the other hand, we could
> > > continue to avoid writing length-zero filters, so long as the
> > > commit-graph indicates that it knows this optimization.
> >
> > Thinking about it a little bit more, I'm pretty sure that this isn't as
> > easy as it sounds. Say that we:
> >
> >   - continued to encode length-zero Bloom filters as equal adjacent
> >     entries in the BIDX, but reserve the length-zero filter for commits
> >     with no changed-paths, _or_ commits whose Bloom filters have not yet
> >     been computed
>
> No, use zero-length filters for commits whose Bloom filters have not
> yet been computed, and use a one-byte all zero bits Bloom filter for
> commits with no modified paths.
>
> And this is exactly what I proposed earlier.

Fair enough, I bet that would work.

Junio, let's eject this series while I try to see if SZEDER's idea is
workable.

> > I don't see a non-convoluted way to split the overloaded length-zero
> > case into something that is distinguishable without a format extension.
>
> See above, no format extension needed.

Where?

Thanks,
Taylor
Junio C Hamano Sept. 6, 2020, 9:52 p.m. UTC | #10
Taylor Blau <me@ttaylorr.com> writes:

>> And this is exactly what I proposed earlier.
>
> Fair enough, I bet that would work.
>
> Junio, let's eject this series while I try to see if SZEDER's idea is
> workable.

Thanks, I'll mark the topic as such for now.
diff mbox series

Patch

diff --git a/blame.c b/blame.c
index 903e23af23..e5ba35dbd1 100644
--- a/blame.c
+++ b/blame.c
@@ -1276,7 +1276,7 @@  static int maybe_changed_path(struct repository *r,
 	if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY)
 		return 1;
 
-	filter = get_bloom_filter(r, origin->commit, 0);
+	filter = get_bloom_filter(r, origin->commit);
 
 	if (!filter)
 		return 1;
diff --git a/bloom.c b/bloom.c
index cd9380ac62..a8a21762f4 100644
--- a/bloom.c
+++ b/bloom.c
@@ -177,9 +177,10 @@  static int pathmap_cmp(const void *hashmap_cmp_fn_data,
 	return strcmp(e1->path, e2->path);
 }
 
-struct bloom_filter *get_bloom_filter(struct repository *r,
-				      struct commit *c,
-				      int compute_if_not_present)
+struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
+						 struct commit *c,
+						 int compute_if_not_present,
+						 int *computed)
 {
 	struct bloom_filter *filter;
 	struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS;
@@ -187,6 +188,9 @@  struct bloom_filter *get_bloom_filter(struct repository *r,
 	struct diff_options diffopt;
 	int max_changes = 512;
 
+	if (computed)
+		*computed = 0;
+
 	if (!bloom_filters.slab_size)
 		return NULL;
 
@@ -273,6 +277,9 @@  struct bloom_filter *get_bloom_filter(struct repository *r,
 		filter->len = 0;
 	}
 
+	if (computed)
+		*computed = 1;
+
 	free(diff_queued_diff.queue);
 	DIFF_QUEUE_CLEAR(&diff_queued_diff);
 
diff --git a/bloom.h b/bloom.h
index 0b9b59a6fe..baa91926db 100644
--- a/bloom.h
+++ b/bloom.h
@@ -89,9 +89,13 @@  void add_key_to_filter(const struct bloom_key *key,
 
 void init_bloom_filters(void);
 
-struct bloom_filter *get_bloom_filter(struct repository *r,
-				      struct commit *c,
-				      int compute_if_not_present);
+struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
+						 struct commit *c,
+						 int compute_if_not_present,
+						 int *computed);
+
+#define get_bloom_filter(r, c) get_or_compute_bloom_filter( \
+	(r), (c), 0, NULL)
 
 int bloom_filter_contains(const struct bloom_filter *filter,
 			  const struct bloom_key *key,
diff --git a/commit-graph.c b/commit-graph.c
index 55af498aa0..cabac7f45b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -971,6 +971,9 @@  struct write_commit_graph_context {
 	const struct split_commit_graph_opts *split_opts;
 	size_t total_bloom_filter_data_size;
 	const struct bloom_filter_settings *bloom_settings;
+
+	int count_bloom_filter_found_large;
+	int count_bloom_filter_computed;
 };
 
 static int write_graph_chunk_fanout(struct hashfile *f,
@@ -1182,7 +1185,7 @@  static int write_graph_chunk_bloom_indexes(struct hashfile *f,
 	uint32_t cur_pos = 0;
 
 	while (list < last) {
-		struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0);
+		struct bloom_filter *filter = get_bloom_filter(ctx->r, *list);
 		size_t len = filter ? filter->len : 0;
 		cur_pos += len;
 		display_progress(ctx->progress, ++ctx->progress_cnt);
@@ -1222,7 +1225,7 @@  static int write_graph_chunk_bloom_data(struct hashfile *f,
 	hashwrite_be32(f, ctx->bloom_settings->bits_per_entry);
 
 	while (list < last) {
-		struct bloom_filter *filter = get_bloom_filter(ctx->r, *list, 0);
+		struct bloom_filter *filter = get_bloom_filter(ctx->r, *list);
 		size_t len = filter ? filter->len : 0;
 
 		display_progress(ctx->progress, ++ctx->progress_cnt);
@@ -1392,6 +1395,22 @@  static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
+static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx)
+{
+	struct json_writer jw = JSON_WRITER_INIT;
+
+	jw_object_begin(&jw, 0);
+	jw_object_intmax(&jw, "filter_found_large",
+			 ctx->count_bloom_filter_found_large);
+	jw_object_intmax(&jw, "filter_computed",
+			 ctx->count_bloom_filter_computed);
+	jw_end(&jw);
+
+	trace2_data_json("commit-graph", the_repository, "bloom_statistics", &jw);
+
+	jw_release(&jw);
+}
+
 static void compute_bloom_filters(struct write_commit_graph_context *ctx)
 {
 	int i;
@@ -1414,12 +1433,25 @@  static void compute_bloom_filters(struct write_commit_graph_context *ctx)
 		QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp);
 
 	for (i = 0; i < ctx->commits.nr; i++) {
+		int computed = 0;
 		struct commit *c = sorted_commits[i];
-		struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1);
+		struct bloom_filter *filter = get_or_compute_bloom_filter(
+			ctx->r,
+			c,
+			1,
+			&computed);
+		if (computed) {
+			ctx->count_bloom_filter_computed++;
+			if (filter && !filter->len)
+				ctx->count_bloom_filter_found_large++;
+		}
 		ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len;
 		display_progress(progress, i + 1);
 	}
 
+	if (trace2_is_enabled())
+		trace2_bloom_filter_write_statistics(ctx);
+
 	free(sorted_commits);
 	stop_progress(&progress);
 }
diff --git a/line-log.c b/line-log.c
index bf73ea95ac..68eeb425f8 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1159,7 +1159,7 @@  static int bloom_filter_check(struct rev_info *rev,
 		return 1;
 
 	if (!rev->bloom_filter_settings ||
-	    !(filter = get_bloom_filter(rev->repo, commit, 0)))
+	    !(filter = get_bloom_filter(rev->repo, commit)))
 		return 1;
 
 	if (!range)
diff --git a/revision.c b/revision.c
index 857274408c..f4be5d1650 100644
--- a/revision.c
+++ b/revision.c
@@ -751,7 +751,7 @@  static int check_maybe_different_in_bloom_filter(struct rev_info *revs,
 	if (commit_graph_generation(commit) == GENERATION_NUMBER_INFINITY)
 		return -1;
 
-	filter = get_bloom_filter(revs->repo, commit, 0);
+	filter = get_bloom_filter(revs->repo, commit);
 
 	if (!filter) {
 		count_bloom_filter_not_present++;
diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c
index 5e77d56f59..9f7bb729fc 100644
--- a/t/helper/test-bloom.c
+++ b/t/helper/test-bloom.c
@@ -39,7 +39,8 @@  static void get_bloom_filter_for_commit(const struct object_id *commit_oid)
 	struct bloom_filter *filter;
 	setup_git_directory();
 	c = lookup_commit(the_repository, commit_oid);
-	filter = get_bloom_filter(the_repository, c, 1);
+	filter = get_or_compute_bloom_filter(the_repository, c, 1,
+					     NULL);
 	print_bloom_filter(filter);
 }