diff mbox series

[v4,04/10] commit-graph: persist existence of changed-paths

Message ID f1e3a8516ebd58b283166a5374843f5cb3332d08.1593610050.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series More commit-graph/Bloom filter improvements | expand

Commit Message

Johannes Schindelin via GitGitGadget July 1, 2020, 1:27 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The changed-path Bloom filters were released in v2.27.0, but have a
significant drawback. A user can opt-in to writing the changed-path
filters using the "--changed-paths" option to "git commit-graph write"
but the next write will drop the filters unless that option is
specified.

This becomes even more important when considering the interaction with
gc.writeCommitGraph (on by default) or fetch.writeCommitGraph (part of
features.experimental). These config options trigger commit-graph writes
that the user did not signal, and hence there is no --changed-paths
option available.

Allow a user that opts-in to the changed-path filters to persist the
property of "my commit-graph has changed-path filters" automatically. A
user can drop filters using the --no-changed-paths option.

In the process, we need to be extremely careful to match the Bloom
filter settings as specified by the commit-graph. This will allow future
versions of Git to customize these settings, and the version with this
change will persist those settings as commit-graphs are rewritten on
top.

Use the trace2 API to signal the settings used during the write, and
check that output in a test after manually adjusting the correct bytes
in the commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  5 +++-
 builtin/commit-graph.c             |  5 +++-
 commit-graph.c                     | 45 ++++++++++++++++++++++++++++--
 commit-graph.h                     |  1 +
 t/t4216-log-bloom.sh               | 17 ++++++++++-
 5 files changed, 67 insertions(+), 6 deletions(-)

Comments

SZEDER Gábor Oct. 15, 2020, 1:21 p.m. UTC | #1
On Wed, Jul 01, 2020 at 01:27:24PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The changed-path Bloom filters were released in v2.27.0, but have a
> significant drawback. A user can opt-in to writing the changed-path
> filters using the "--changed-paths" option to "git commit-graph write"
> but the next write will drop the filters unless that option is
> specified.
> 
> This becomes even more important when considering the interaction with
> gc.writeCommitGraph (on by default) or fetch.writeCommitGraph (part of
> features.experimental). These config options trigger commit-graph writes
> that the user did not signal, and hence there is no --changed-paths
> option available.
> 
> Allow a user that opts-in to the changed-path filters to persist the
> property of "my commit-graph has changed-path filters" automatically. A
> user can drop filters using the --no-changed-paths option.

The above parts of the commit message and the corresponding changes
are OK, but ...

> In the process, we need to be extremely careful to match the Bloom
> filter settings as specified by the commit-graph. This will allow future
> versions of Git to customize these settings, and the version with this
> change will persist those settings as commit-graphs are rewritten on
> top.

As pointed out in my original bug report [1], modified path Bloom
filters are computed with hardcoded settings in
bloom.c:get_bloom_filter().  Since this patch does not touch bloom.c
at all, it still computes Bloom filters with those hardcoded settings,
and, consequently, despite the commit message's claims, it does not
persist the settings in the existing commit-graph.

[1] https://public-inbox.org/git/20200619140230.GB22200@szeder.dev/

> Use the trace2 API to signal the settings used during the write, and
> check that output in a test after manually adjusting the correct bytes
> in the commit-graph file.

This test is insufficient, as it only checks what settings trace2
believes the Bloom filters are computed with, not what settings they
are actually computed with; that's why it succeeded while the bug
whose absence it was supposed to ensure was still there.

More robust tests should instead look at what actually gets written to
the commit-graph, and how that is interpreted during pathspec-limited
revision walks.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

A "Reported-by: me" trailer would have been appropriate here.

> ---
>  Documentation/git-commit-graph.txt |  5 +++-
>  builtin/commit-graph.c             |  5 +++-
>  commit-graph.c                     | 45 ++++++++++++++++++++++++++++--
>  commit-graph.h                     |  1 +
>  t/t4216-log-bloom.sh               | 17 ++++++++++-
>  5 files changed, 67 insertions(+), 6 deletions(-)

Anyway, this is now partially fixed in 9a7a9ed10d (bloom: use provided
'struct bloom_filter_settings', 2020-09-16), though, unfortunately,
its commit message is not quite clear on this.  Alas, that's only a
partial fix, because we still only look at the top level commit-graph
file for existing Bloom filter settings.  However, deeper commit-graph
layers can contain Bloom filters with non-default settings even when
the top level doesn't, and these failing tests below demonstrate:

  ---  >8  ---

#!/bin/sh

test_description='test'

. ./test-lib.sh

test_expect_success 'setup' '
	git commit --allow-empty -m "Bloom filters are written but ignored for root commits :(" &&
	for i in 1 2 3
	do
		echo $i >file &&
		git add file &&
		git commit -m "$i" || return 1
	done &&
	git log --oneline --no-decorate -- file >expect
'

test_expect_success 'split' '
	# Compute Bloom filters with "unusual" settings.
	git rev-parse HEAD^^ | GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git commit-graph write --stdin-commits --changed-paths --split &&
	# A commit-graph layer without Bloom filters "hides" the layers
	# below ...
	git rev-parse HEAD^ | git commit-graph write --stdin-commits --no-changed-paths --split=no-merge &&
	# ... so this does not look at existing Bloom filters and their
	# settings in the bottom commit-graph layer and computes new
	# Bloom filters using the default 7 hashes.
	git rev-parse HEAD | git commit-graph write --stdin-commits --changed-paths --split=no-merge &&

	# Just to make sure that there are as many graph layers as I
	# think there should be.
	test_line_count = 3 .git/objects/info/commit-graphs/commit-graph-chain &&

	# This checks Bloom filters using settings in the top layer,
	# thus misses commits modifying file in the bottom commit-graph
	# layer.
	git log --oneline --no-decorate -- file >actual &&
	test_cmp expect actual
'

test_expect_success 'merged' '
	# This merges all existing layers, and computes missing Bloom
	# filters with the settings in the top layer, without noticing
	# that filters in the bottom layer were computed with different
	# settings.
	git commit-graph write --reachable --changed-paths &&

	# Just to make sure...
	test_path_is_file .git/objects/info/commit-graph &&

	# This misses commits modifying file that were merged from the
	# bottom commit-graph layer.
	git log --oneline --no-decorate -- file >actual &&
	test_cmp expect actual
'

test_done
Taylor Blau Oct. 15, 2020, 9:41 p.m. UTC | #2
On Thu, Oct 15, 2020 at 03:21:47PM +0200, SZEDER Gábor wrote:
> As pointed out in my original bug report [1], modified path Bloom
> filters are computed with hardcoded settings in
> bloom.c:get_bloom_filter().  Since this patch does not touch bloom.c
> at all, it still computes Bloom filters with those hardcoded settings,
> and, consequently, despite the commit message's claims, it does not
> persist the settings in the existing commit-graph.
>
> [1] https://public-inbox.org/git/20200619140230.GB22200@szeder.dev/

Right. It is worth mentioning here (as you do below) that this was fixed
as of 9a7a9ed10d (bloom: use provided 'struct bloom_filter_settings',
2020-09-16).

> > Use the trace2 API to signal the settings used during the write, and
> > check that output in a test after manually adjusting the correct bytes
> > in the commit-graph file.
>
> This test is insufficient, as it only checks what settings trace2
> believes the Bloom filters are computed with, not what settings they
> are actually computed with; that's why it succeeded while the bug
> whose absence it was supposed to ensure was still there.
>
> More robust tests should instead look at what actually gets written to
> the commit-graph, and how that is interpreted during pathspec-limited
> revision walks.

Thanks for the test! That saved me a little bit of work trying to set up
the scenario you're describing in a reproducible way. I think that the
third test can be fixed relatively easily. The crux of the issue there
is that we are computing Bloom filters for commits, some of which
already had Bloom filters computed in an earlier layer, but with
different settings than the one that we're using to write the current
layer.

So, we need to be more aggressively checking the Bloom filter settings
in any layer we want to reuse a Bloom filter out of before reusing it
verbatim in the current layer. The patch below the scissors line is
sufficient to do that, and it causes the third test to start passing.

...But, teaching the revision machinery how to handle a walk through
commits in multiple commit-graph layers with incompatible Bloom filter
settings is trickier. Right now we compute all of the Bloom keys up
front using the Bloom filter settings in the top layer. I think we'd
probably want to change this to lazily compute those keys by:

  - Keeping around the contents of the Bloom keys, i.e., the pathspecs
    themselves.

  - Keeping a hashmap from Bloom filter settings to computed Bloom keys
    corresponding to those settings.

  - Lazily filling in that hashmap as we traverse through different
    commits.

At least, that's the best way that I can think to do it. It does get
kind of slow, though; we'd have to scan every commit graph layer at each
commit in the worst case to find the actual 'struct commit_graph *' in
order to get its Bloom filter settings. So, I think that's sort of
show-stoppingly slow, and that we should probably think more deeply
about it before taking up that direction.

Maybe Stolee has some better thoughts about how we can quickly go from a
commit to either (a) the commit-graph struct that that commit is stored
in, or (b) the Bloom filter settings belonging to that struct.

Thanks,
Taylor

--- >8 ---

Subject: [PATCH] bloom: recompute filters with incompatible settings
---
 bloom.c        | 21 +++++++++++++++++++--
 commit-graph.c |  4 ++--
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/bloom.c b/bloom.c
index 68c73200a5..30da128474 100644
--- a/bloom.c
+++ b/bloom.c
@@ -30,7 +30,8 @@ static inline unsigned char get_bitmask(uint32_t pos)

 static int load_bloom_filter_from_graph(struct commit_graph *g,
 					struct bloom_filter *filter,
-					struct commit *c)
+					struct commit *c,
+					const struct bloom_filter_settings *settings)
 {
 	uint32_t lex_pos, start_index, end_index;
 	uint32_t graph_pos = commit_graph_position(c);
@@ -42,6 +43,21 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
 	if (!g->chunk_bloom_indexes)
 		return 0;

+	if (settings) {
+		struct bloom_filter_settings *graph_settings = g->bloom_filter_settings;
+		/*
+		 * Check that the settings used to generate new Bloom filters is
+		 * compatible with the settings Bloom filters in this
+		 * commit-graph layer were generated with.
+		 */
+		if (settings->hash_version != graph_settings->hash_version)
+			return 0;
+		if (settings->num_hashes != graph_settings->num_hashes)
+			return 0;
+		if (settings->bits_per_entry != graph_settings->bits_per_entry)
+			return 0;
+	}
+
 	lex_pos = graph_pos - g->num_commits_in_base;

 	end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos);
@@ -205,7 +221,8 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	if (!filter->data) {
 		load_commit_graph_info(r, c);
 		if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH)
-			load_bloom_filter_from_graph(r->objects->commit_graph, filter, c);
+			load_bloom_filter_from_graph(r->objects->commit_graph,
+						     filter, c, settings);
 	}

 	if (filter->data && filter->len)
diff --git a/commit-graph.c b/commit-graph.c
index cb042bdba8..afe14af2a3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1188,7 +1188,7 @@ static int write_graph_chunk_bloom_indexes(struct hashfile *f,
 	uint32_t cur_pos = 0;

 	while (list < last) {
-		struct bloom_filter *filter = get_bloom_filter(ctx->r, *list);
+		struct bloom_filter *filter = get_or_compute_bloom_filter(ctx->r, *list, 0, ctx->bloom_settings, NULL);
 		size_t len = filter ? filter->len : 0;
 		cur_pos += len;
 		display_progress(ctx->progress, ++ctx->progress_cnt);
@@ -1228,7 +1228,7 @@ static int write_graph_chunk_bloom_data(struct hashfile *f,
 	hashwrite_be32(f, ctx->bloom_settings->bits_per_entry);

 	while (list < last) {
-		struct bloom_filter *filter = get_bloom_filter(ctx->r, *list);
+		struct bloom_filter *filter = get_or_compute_bloom_filter(ctx->r, *list, 0, ctx->bloom_settings, NULL);
 		size_t len = filter ? filter->len : 0;

 		display_progress(ctx->progress, ++ctx->progress_cnt);
--
2.29.0.rc1.dirty
Derrick Stolee Oct. 16, 2020, 2:18 a.m. UTC | #3
On 10/15/2020 5:41 PM, Taylor Blau wrote:
> So, we need to be more aggressively checking the Bloom filter settings
> in any layer we want to reuse a Bloom filter out of before reusing it
> verbatim in the current layer. The patch below the scissors line is
> sufficient to do that, and it causes the third test to start passing.

I think there are three things we should keep in mind:

1. Incompatible Bloom filter settings between layers should be seen
   as _inconsistent data_ as Git should not be writing incremental
   commit-graph files with inconsistent Bloom filter settings. Thus,
   when reading the commit-graph chain we should prevent incompatible
   filters from being used. One way to do this is to notice different
   settings and completely disable Bloom filters. The other way would
   be to take the settings from the first layer with filters and then
   clear the chunk_bloom_indexes and chunk_bloom_data fields for the
   layers that don't agree. This fits with an expectation that lower
   layers are larger, so more filters can be used in that situation.

2. We should be sure that Git will not agree to write incompatible
   settings between layers of a commit-graph chain. Even more, it
   should definitely not re-use filters when merging layers with
   incompatible filter values. The strategy above to ignore Bloom
   filters in incompatible upper layers is enough of a guard against
   the "merge layers" situation.

3. Allowing users (or future Git developers) to adjust the default
   Bloom filter settings is one that is good to do for future-proofing,
   but not one that I expect to be widely used (any gains here are
   minuscule compared to the results already achieved with the defaults).
   On top of that, including incompatible settings across layers is even
   less likely to be a real use case. We should not be straining to make
   the code even worse for this imaginary scenario.

With that said...
 
> ...But, teaching the revision machinery how to handle a walk through
> commits in multiple commit-graph layers with incompatible Bloom filter
> settings is trickier. Right now we compute all of the Bloom keys up
> front using the Bloom filter settings in the top layer. I think we'd
> probably want to change this to lazily compute those keys by:
It would probably be easiest to compute an array of bloom_key structs
(per path) where the index corresponds to the depth of the commit-graph
layer. You can then access the correct key after you have identified
which layer the commit is from.

> At least, that's the best way that I can think to do it. It does get
> kind of slow, though; we'd have to scan every commit graph layer at each
> commit in the worst case to find the actual 'struct commit_graph *' in
> order to get its Bloom filter settings. So, I think that's sort of
> show-stoppingly slow, and that we should probably think more deeply
> about it before taking up that direction.
> 
> Maybe Stolee has some better thoughts about how we can quickly go from a
> commit to either (a) the commit-graph struct that that commit is stored
> in, or (b) the Bloom filter settings belonging to that struct.

We already have code that finds a commit in a commit-graph layer
based on its integer position by iterating until num_commits_in_base
is below our position. The lexicographic position within that layer
is found by subtracting num_commits_in_base.

For us, we would simply need:

int get_commit_graph_layer(struct commit_graph *g, uint32_t pos)
{
	uint32_t layer_index = 0;

	while (g && pos < g->num_commits_in_base) {
		g = g->base_graph;
		layer_index++;
	}

	return layer_index;
}

You could then use the response from get_commit_graph_layer()
to load the correct Bloom key.

Again, I strongly suggest _not_ working hard to support this
case. We should only put in proper safeguards to prevent data
like this being written and protect a Git process that might
stumble across data in this shape.

Thanks,
-Stolee
Taylor Blau Oct. 16, 2020, 3:18 a.m. UTC | #4
On Thu, Oct 15, 2020 at 10:18:47PM -0400, Derrick Stolee wrote:
> On 10/15/2020 5:41 PM, Taylor Blau wrote:
> > So, we need to be more aggressively checking the Bloom filter settings
> > in any layer we want to reuse a Bloom filter out of before reusing it
> > verbatim in the current layer. The patch below the scissors line is
> > sufficient to do that, and it causes the third test to start passing.
>
> I think there are three things we should keep in mind:
>
> 1. Incompatible Bloom filter settings between layers should be seen
>    as _inconsistent data_ as Git should not be writing incremental
>    commit-graph files with inconsistent Bloom filter settings. Thus,
>    when reading the commit-graph chain we should prevent incompatible
>    filters from being used. One way to do this is to notice different
>    settings and completely disable Bloom filters. The other way would
>    be to take the settings from the first layer with filters and then
>    clear the chunk_bloom_indexes and chunk_bloom_data fields for the
>    layers that don't agree. This fits with an expectation that lower
>    layers are larger, so more filters can be used in that situation.

Sure; I'd be fine with only allowing filters computed with the settings
present in the lowest or largest layer in the event that multiple layers
exist with incompatible settings.

I'm trying to point us towards a direction of not optimizing too far
along a direction that we're unlikely to take, while also trying to do
something relatively non-invasive to make it possible for a version of
Git to change the default Bloom settings. That is, if a user is writing
split commit-graphs, and we change the default Bloom settings, they
shouldn't have to recompute or merge down all of their Bloom filters.

If that's something that we never think is going to happen, I'm fine
with not thinking too hard about it. But, I also don't want to paint
ourselves into a corner, so I think something like the patch I wrote in
the email that you're replying to actually may be worth pursuing
further. I dunno. Definitely after 2.29, though.

> 2. We should be sure that Git will not agree to write incompatible
>    settings between layers of a commit-graph chain. Even more, it
>    should definitely not re-use filters when merging layers with
>    incompatible filter values. The strategy above to ignore Bloom
>    filters in incompatible upper layers is enough of a guard against
>    the "merge layers" situation.

Yeah, this would be fine with me.

> 3. Allowing users (or future Git developers) to adjust the default
>    Bloom filter settings is one that is good to do for future-proofing,
>    but not one that I expect to be widely used (any gains here are
>    minuscule compared to the results already achieved with the defaults).
>    On top of that, including incompatible settings across layers is even
>    less likely to be a real use case. We should not be straining to make
>    the code even worse for this imaginary scenario.

Right, I think we're pretty much in agreement here. Doing something easy
to make sure that we don't run into a wall seems to make sense, but I
think modifying the revision walk machinery to keep track of multiple
Bloom keys computed with different settings corresponding to the set of
Bloom filter settings in commit-graph layers is probably too far in that
direction.

For what it's worth, I was mainly talking about it to say that it would
be more effort than it's probably worth to do. There's also nothing that
we're currently discussing that would prevent us from taking that same
direction up in six months from now.

Thanks,
Taylor
Derrick Stolee Oct. 16, 2020, 1:52 p.m. UTC | #5
On 10/15/2020 11:18 PM, Taylor Blau wrote:
> On Thu, Oct 15, 2020 at 10:18:47PM -0400, Derrick Stolee wrote:
>> On 10/15/2020 5:41 PM, Taylor Blau wrote:
>>> So, we need to be more aggressively checking the Bloom filter settings
>>> in any layer we want to reuse a Bloom filter out of before reusing it
>>> verbatim in the current layer. The patch below the scissors line is
>>> sufficient to do that, and it causes the third test to start passing.
>>
>> I think there are three things we should keep in mind:
>>
>> 1. Incompatible Bloom filter settings between layers should be seen
>>    as _inconsistent data_ as Git should not be writing incremental
>>    commit-graph files with inconsistent Bloom filter settings. Thus,
>>    when reading the commit-graph chain we should prevent incompatible
>>    filters from being used. One way to do this is to notice different
>>    settings and completely disable Bloom filters. The other way would
>>    be to take the settings from the first layer with filters and then
>>    clear the chunk_bloom_indexes and chunk_bloom_data fields for the
>>    layers that don't agree. This fits with an expectation that lower
>>    layers are larger, so more filters can be used in that situation.
> 
> Sure; I'd be fine with only allowing filters computed with the settings
> present in the lowest or largest layer in the event that multiple layers
> exist with incompatible settings.
> 
> I'm trying to point us towards a direction of not optimizing too far
> along a direction that we're unlikely to take, while also trying to do
> something relatively non-invasive to make it possible for a version of
> Git to change the default Bloom settings. That is, if a user is writing
> split commit-graphs, and we change the default Bloom settings, they
> shouldn't have to recompute or merge down all of their Bloom filters.

They would need to recompute when they merge layers, which introduces
a big question about how we should handle such a case.

> If that's something that we never think is going to happen, I'm fine
> with not thinking too hard about it. But, I also don't want to paint
> ourselves into a corner, so I think something like the patch I wrote in
> the email that you're replying to actually may be worth pursuing
> further. I dunno. Definitely after 2.29, though.

I think the proposed "react properly to this unlikely situation"
is a good way to prevent getting locked into our choices now. It
makes it possible for "old" Git versions (2.30 until we decide to
allow this mix) to interact with the inconsistent settings without
failure.

We don't need to do the 100% "optimal" case of using all filters
in order to enable this choice in the future.
 
[...]

> For what it's worth, I was mainly talking about it to say that it would
> be more effort than it's probably worth to do. There's also nothing that
> we're currently discussing that would prevent us from taking that same
> direction up in six months from now.

Yes, I just want to make sure that everyone agrees there is a
middle ground without saying that inconsistent filter settings
across layers is a "fully supported" feature. If someone wants
to tackle the work to make it a desirable state, then they can
try that (with great care).

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index f4b13c005b..369b222b08 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -60,7 +60,10 @@  existing commit-graph file.
 With the `--changed-paths` option, compute and write information about the
 paths changed between a commit and it's first parent. This operation can
 take a while on large repositories. It provides significant performance gains
-for getting history of a directory or a file with `git log -- <path>`.
+for getting history of a directory or a file with `git log -- <path>`. If
+this option is given, future commit-graph writes will automatically assume
+that this option was intended. Use `--no-changed-paths` to stop storing this
+data.
 +
 With the `--split` option, write the commit-graph as a chain of multiple
 commit-graph files stored in `<dir>/info/commit-graphs`. The new commits
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 59009837dc..ff7b177c33 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -151,6 +151,7 @@  static int graph_write(int argc, const char **argv)
 	};
 
 	opts.progress = isatty(2);
+	opts.enable_changed_paths = -1;
 	split_opts.size_multiple = 2;
 	split_opts.max_commits = 0;
 	split_opts.expire_time = 0;
@@ -171,7 +172,9 @@  static int graph_write(int argc, const char **argv)
 		flags |= COMMIT_GRAPH_WRITE_SPLIT;
 	if (opts.progress)
 		flags |= COMMIT_GRAPH_WRITE_PROGRESS;
-	if (opts.enable_changed_paths ||
+	if (!opts.enable_changed_paths)
+		flags |= COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS;
+	if (opts.enable_changed_paths == 1 ||
 	    git_env_bool(GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS, 0))
 		flags |= COMMIT_GRAPH_WRITE_BLOOM_FILTERS;
 
diff --git a/commit-graph.c b/commit-graph.c
index 50ce039a53..6762704324 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -16,6 +16,8 @@ 
 #include "progress.h"
 #include "bloom.h"
 #include "commit-slab.h"
+#include "json-writer.h"
+#include "trace2.h"
 
 void git_test_write_commit_graph_or_die(void)
 {
@@ -1108,6 +1110,21 @@  static void write_graph_chunk_bloom_indexes(struct hashfile *f,
 	stop_progress(&progress);
 }
 
+static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx)
+{
+	struct json_writer jw = JSON_WRITER_INIT;
+
+	jw_object_begin(&jw, 0);
+	jw_object_intmax(&jw, "hash_version", ctx->bloom_settings->hash_version);
+	jw_object_intmax(&jw, "num_hashes", ctx->bloom_settings->num_hashes);
+	jw_object_intmax(&jw, "bits_per_entry", ctx->bloom_settings->bits_per_entry);
+	jw_end(&jw);
+
+	trace2_data_json("bloom", ctx->r, "settings", &jw);
+
+	jw_release(&jw);
+}
+
 static void write_graph_chunk_bloom_data(struct hashfile *f,
 					 struct write_commit_graph_context *ctx)
 {
@@ -1116,6 +1133,8 @@  static void write_graph_chunk_bloom_data(struct hashfile *f,
 	struct progress *progress = NULL;
 	int i = 0;
 
+	trace2_bloom_filter_settings(ctx);
+
 	if (ctx->report_progress)
 		progress = start_delayed_progress(
 			_("Writing changed paths Bloom filters data"),
@@ -1547,9 +1566,15 @@  static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	int num_chunks = 3;
 	uint64_t chunk_offset;
 	struct object_id file_hash;
-	const struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS;
+	struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS;
 
-	ctx->bloom_settings = &bloom_settings;
+	if (!ctx->bloom_settings) {
+		bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
+							      bloom_settings.bits_per_entry);
+		bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
+							  bloom_settings.num_hashes);
+		ctx->bloom_settings = &bloom_settings;
+	}
 
 	if (ctx->split) {
 		struct strbuf tmp_file = STRBUF_INIT;
@@ -1974,9 +1999,23 @@  int write_commit_graph(struct object_directory *odb,
 	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
 	ctx->check_oids = flags & COMMIT_GRAPH_WRITE_CHECK_OIDS ? 1 : 0;
 	ctx->split_opts = split_opts;
-	ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0;
 	ctx->total_bloom_filter_data_size = 0;
 
+	if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS)
+		ctx->changed_paths = 1;
+	if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) {
+		struct commit_graph *g;
+		prepare_commit_graph_one(ctx->r, ctx->odb);
+
+		g = ctx->r->objects->commit_graph;
+
+		/* We have changed-paths already. Keep them in the next graph */
+		if (g && g->chunk_bloom_data) {
+			ctx->changed_paths = 1;
+			ctx->bloom_settings = g->bloom_filter_settings;
+		}
+	}
+
 	if (ctx->split) {
 		struct commit_graph *g;
 		prepare_commit_graph(ctx->r);
diff --git a/commit-graph.h b/commit-graph.h
index f0fb13e3f2..45b1e5bca3 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -96,6 +96,7 @@  enum commit_graph_write_flags {
 	/* Make sure that each OID in the input is a valid commit OID. */
 	COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3),
 	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 4),
+	COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS = (1 << 5),
 };
 
 struct split_commit_graph_opts {
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 2761208e74..52ad998f9e 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -126,7 +126,7 @@  test_expect_success 'setup - add commit-graph to the chain without Bloom filters
 	test_commit c14 A/anotherFile2 &&
 	test_commit c15 A/B/anotherFile2 &&
 	test_commit c16 A/B/C/anotherFile2 &&
-	GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 git commit-graph write --reachable --split &&
+	git commit-graph write --reachable --split --no-changed-paths &&
 	test_line_count = 2 .git/objects/info/commit-graphs/commit-graph-chain
 '
 
@@ -152,6 +152,21 @@  test_expect_success 'Use Bloom filters if they exist in the latest but not all c
 	test_bloom_filters_used_when_some_filters_are_missing "-- A/B"
 '
 
+test_expect_success 'persist filter settings' '
+	test_when_finished rm -rf .git/objects/info/commit-graph* &&
+	rm -rf .git/objects/info/commit-graph* &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		GIT_TRACE2_EVENT_NESTING=5 \
+		GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=9 \
+		GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY=15 \
+		git commit-graph write --reachable --changed-paths &&
+	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2-auto.txt" \
+		GIT_TRACE2_EVENT_NESTING=5 \
+		git commit-graph write --reachable --changed-paths &&
+	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2-auto.txt
+'
+
 test_expect_success 'correctly report changes over limit' '
 	git init 513changes &&
 	(