Message ID | f1e3a8516ebd58b283166a5374843f5cb3332d08.1593610050.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | More commit-graph/Bloom filter improvements | expand |
On Wed, Jul 01, 2020 at 01:27:24PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> > > The changed-path Bloom filters were released in v2.27.0, but have a > significant drawback. A user can opt-in to writing the changed-path > filters using the "--changed-paths" option to "git commit-graph write" > but the next write will drop the filters unless that option is > specified. > > This becomes even more important when considering the interaction with > gc.writeCommitGraph (on by default) or fetch.writeCommitGraph (part of > features.experimental). These config options trigger commit-graph writes > that the user did not signal, and hence there is no --changed-paths > option available. > > Allow a user that opts-in to the changed-path filters to persist the > property of "my commit-graph has changed-path filters" automatically. A > user can drop filters using the --no-changed-paths option. The above parts of the commit message and the corresponding changes are OK, but ... > In the process, we need to be extremely careful to match the Bloom > filter settings as specified by the commit-graph. This will allow future > versions of Git to customize these settings, and the version with this > change will persist those settings as commit-graphs are rewritten on > top. As pointed out in my original bug report [1], modified path Bloom filters are computed with hardcoded settings in bloom.c:get_bloom_filter(). Since this patch does not touch bloom.c at all, it still computes Bloom filters with those hardcoded settings, and, consequently, despite the commit message's claims, it does not persist the settings in the existing commit-graph. [1] https://public-inbox.org/git/20200619140230.GB22200@szeder.dev/ > Use the trace2 API to signal the settings used during the write, and > check that output in a test after manually adjusting the correct bytes > in the commit-graph file. This test is insufficient, as it only checks what settings trace2 believes the Bloom filters are computed with, not what settings they are actually computed with; that's why it succeeded while the bug whose absence it was supposed to ensure was still there. More robust tests should instead look at what actually gets written to the commit-graph, and how that is interpreted during pathspec-limited revision walks. > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> A "Reported-by: me" trailer would have been appropriate here. > --- > Documentation/git-commit-graph.txt | 5 +++- > builtin/commit-graph.c | 5 +++- > commit-graph.c | 45 ++++++++++++++++++++++++++++-- > commit-graph.h | 1 + > t/t4216-log-bloom.sh | 17 ++++++++++- > 5 files changed, 67 insertions(+), 6 deletions(-) Anyway, this is now partially fixed in 9a7a9ed10d (bloom: use provided 'struct bloom_filter_settings', 2020-09-16), though, unfortunately, its commit message is not quite clear on this. Alas, that's only a partial fix, because we still only look at the top level commit-graph file for existing Bloom filter settings. However, deeper commit-graph layers can contain Bloom filters with non-default settings even when the top level doesn't, and these failing tests below demonstrate: --- >8 --- #!/bin/sh test_description='test' . ./test-lib.sh test_expect_success 'setup' ' git commit --allow-empty -m "Bloom filters are written but ignored for root commits :(" && for i in 1 2 3 do echo $i >file && git add file && git commit -m "$i" || return 1 done && git log --oneline --no-decorate -- file >expect ' test_expect_success 'split' ' # Compute Bloom filters with "unusual" settings. git rev-parse HEAD^^ | GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git commit-graph write --stdin-commits --changed-paths --split && # A commit-graph layer without Bloom filters "hides" the layers # below ... git rev-parse HEAD^ | git commit-graph write --stdin-commits --no-changed-paths --split=no-merge && # ... so this does not look at existing Bloom filters and their # settings in the bottom commit-graph layer and computes new # Bloom filters using the default 7 hashes. git rev-parse HEAD | git commit-graph write --stdin-commits --changed-paths --split=no-merge && # Just to make sure that there are as many graph layers as I # think there should be. test_line_count = 3 .git/objects/info/commit-graphs/commit-graph-chain && # This checks Bloom filters using settings in the top layer, # thus misses commits modifying file in the bottom commit-graph # layer. git log --oneline --no-decorate -- file >actual && test_cmp expect actual ' test_expect_success 'merged' ' # This merges all existing layers, and computes missing Bloom # filters with the settings in the top layer, without noticing # that filters in the bottom layer were computed with different # settings. git commit-graph write --reachable --changed-paths && # Just to make sure... test_path_is_file .git/objects/info/commit-graph && # This misses commits modifying file that were merged from the # bottom commit-graph layer. git log --oneline --no-decorate -- file >actual && test_cmp expect actual ' test_done
On Thu, Oct 15, 2020 at 03:21:47PM +0200, SZEDER Gábor wrote: > As pointed out in my original bug report [1], modified path Bloom > filters are computed with hardcoded settings in > bloom.c:get_bloom_filter(). Since this patch does not touch bloom.c > at all, it still computes Bloom filters with those hardcoded settings, > and, consequently, despite the commit message's claims, it does not > persist the settings in the existing commit-graph. > > [1] https://public-inbox.org/git/20200619140230.GB22200@szeder.dev/ Right. It is worth mentioning here (as you do below) that this was fixed as of 9a7a9ed10d (bloom: use provided 'struct bloom_filter_settings', 2020-09-16). > > Use the trace2 API to signal the settings used during the write, and > > check that output in a test after manually adjusting the correct bytes > > in the commit-graph file. > > This test is insufficient, as it only checks what settings trace2 > believes the Bloom filters are computed with, not what settings they > are actually computed with; that's why it succeeded while the bug > whose absence it was supposed to ensure was still there. > > More robust tests should instead look at what actually gets written to > the commit-graph, and how that is interpreted during pathspec-limited > revision walks. Thanks for the test! That saved me a little bit of work trying to set up the scenario you're describing in a reproducible way. I think that the third test can be fixed relatively easily. The crux of the issue there is that we are computing Bloom filters for commits, some of which already had Bloom filters computed in an earlier layer, but with different settings than the one that we're using to write the current layer. So, we need to be more aggressively checking the Bloom filter settings in any layer we want to reuse a Bloom filter out of before reusing it verbatim in the current layer. The patch below the scissors line is sufficient to do that, and it causes the third test to start passing. ...But, teaching the revision machinery how to handle a walk through commits in multiple commit-graph layers with incompatible Bloom filter settings is trickier. Right now we compute all of the Bloom keys up front using the Bloom filter settings in the top layer. I think we'd probably want to change this to lazily compute those keys by: - Keeping around the contents of the Bloom keys, i.e., the pathspecs themselves. - Keeping a hashmap from Bloom filter settings to computed Bloom keys corresponding to those settings. - Lazily filling in that hashmap as we traverse through different commits. At least, that's the best way that I can think to do it. It does get kind of slow, though; we'd have to scan every commit graph layer at each commit in the worst case to find the actual 'struct commit_graph *' in order to get its Bloom filter settings. So, I think that's sort of show-stoppingly slow, and that we should probably think more deeply about it before taking up that direction. Maybe Stolee has some better thoughts about how we can quickly go from a commit to either (a) the commit-graph struct that that commit is stored in, or (b) the Bloom filter settings belonging to that struct. Thanks, Taylor --- >8 --- Subject: [PATCH] bloom: recompute filters with incompatible settings --- bloom.c | 21 +++++++++++++++++++-- commit-graph.c | 4 ++-- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/bloom.c b/bloom.c index 68c73200a5..30da128474 100644 --- a/bloom.c +++ b/bloom.c @@ -30,7 +30,8 @@ static inline unsigned char get_bitmask(uint32_t pos) static int load_bloom_filter_from_graph(struct commit_graph *g, struct bloom_filter *filter, - struct commit *c) + struct commit *c, + const struct bloom_filter_settings *settings) { uint32_t lex_pos, start_index, end_index; uint32_t graph_pos = commit_graph_position(c); @@ -42,6 +43,21 @@ static int load_bloom_filter_from_graph(struct commit_graph *g, if (!g->chunk_bloom_indexes) return 0; + if (settings) { + struct bloom_filter_settings *graph_settings = g->bloom_filter_settings; + /* + * Check that the settings used to generate new Bloom filters is + * compatible with the settings Bloom filters in this + * commit-graph layer were generated with. + */ + if (settings->hash_version != graph_settings->hash_version) + return 0; + if (settings->num_hashes != graph_settings->num_hashes) + return 0; + if (settings->bits_per_entry != graph_settings->bits_per_entry) + return 0; + } + lex_pos = graph_pos - g->num_commits_in_base; end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos); @@ -205,7 +221,8 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, if (!filter->data) { load_commit_graph_info(r, c); if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH) - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); + load_bloom_filter_from_graph(r->objects->commit_graph, + filter, c, settings); } if (filter->data && filter->len) diff --git a/commit-graph.c b/commit-graph.c index cb042bdba8..afe14af2a3 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1188,7 +1188,7 @@ static int write_graph_chunk_bloom_indexes(struct hashfile *f, uint32_t cur_pos = 0; while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); + struct bloom_filter *filter = get_or_compute_bloom_filter(ctx->r, *list, 0, ctx->bloom_settings, NULL); size_t len = filter ? filter->len : 0; cur_pos += len; display_progress(ctx->progress, ++ctx->progress_cnt); @@ -1228,7 +1228,7 @@ static int write_graph_chunk_bloom_data(struct hashfile *f, hashwrite_be32(f, ctx->bloom_settings->bits_per_entry); while (list < last) { - struct bloom_filter *filter = get_bloom_filter(ctx->r, *list); + struct bloom_filter *filter = get_or_compute_bloom_filter(ctx->r, *list, 0, ctx->bloom_settings, NULL); size_t len = filter ? filter->len : 0; display_progress(ctx->progress, ++ctx->progress_cnt); -- 2.29.0.rc1.dirty
On 10/15/2020 5:41 PM, Taylor Blau wrote: > So, we need to be more aggressively checking the Bloom filter settings > in any layer we want to reuse a Bloom filter out of before reusing it > verbatim in the current layer. The patch below the scissors line is > sufficient to do that, and it causes the third test to start passing. I think there are three things we should keep in mind: 1. Incompatible Bloom filter settings between layers should be seen as _inconsistent data_ as Git should not be writing incremental commit-graph files with inconsistent Bloom filter settings. Thus, when reading the commit-graph chain we should prevent incompatible filters from being used. One way to do this is to notice different settings and completely disable Bloom filters. The other way would be to take the settings from the first layer with filters and then clear the chunk_bloom_indexes and chunk_bloom_data fields for the layers that don't agree. This fits with an expectation that lower layers are larger, so more filters can be used in that situation. 2. We should be sure that Git will not agree to write incompatible settings between layers of a commit-graph chain. Even more, it should definitely not re-use filters when merging layers with incompatible filter values. The strategy above to ignore Bloom filters in incompatible upper layers is enough of a guard against the "merge layers" situation. 3. Allowing users (or future Git developers) to adjust the default Bloom filter settings is one that is good to do for future-proofing, but not one that I expect to be widely used (any gains here are minuscule compared to the results already achieved with the defaults). On top of that, including incompatible settings across layers is even less likely to be a real use case. We should not be straining to make the code even worse for this imaginary scenario. With that said... > ...But, teaching the revision machinery how to handle a walk through > commits in multiple commit-graph layers with incompatible Bloom filter > settings is trickier. Right now we compute all of the Bloom keys up > front using the Bloom filter settings in the top layer. I think we'd > probably want to change this to lazily compute those keys by: It would probably be easiest to compute an array of bloom_key structs (per path) where the index corresponds to the depth of the commit-graph layer. You can then access the correct key after you have identified which layer the commit is from. > At least, that's the best way that I can think to do it. It does get > kind of slow, though; we'd have to scan every commit graph layer at each > commit in the worst case to find the actual 'struct commit_graph *' in > order to get its Bloom filter settings. So, I think that's sort of > show-stoppingly slow, and that we should probably think more deeply > about it before taking up that direction. > > Maybe Stolee has some better thoughts about how we can quickly go from a > commit to either (a) the commit-graph struct that that commit is stored > in, or (b) the Bloom filter settings belonging to that struct. We already have code that finds a commit in a commit-graph layer based on its integer position by iterating until num_commits_in_base is below our position. The lexicographic position within that layer is found by subtracting num_commits_in_base. For us, we would simply need: int get_commit_graph_layer(struct commit_graph *g, uint32_t pos) { uint32_t layer_index = 0; while (g && pos < g->num_commits_in_base) { g = g->base_graph; layer_index++; } return layer_index; } You could then use the response from get_commit_graph_layer() to load the correct Bloom key. Again, I strongly suggest _not_ working hard to support this case. We should only put in proper safeguards to prevent data like this being written and protect a Git process that might stumble across data in this shape. Thanks, -Stolee
On Thu, Oct 15, 2020 at 10:18:47PM -0400, Derrick Stolee wrote: > On 10/15/2020 5:41 PM, Taylor Blau wrote: > > So, we need to be more aggressively checking the Bloom filter settings > > in any layer we want to reuse a Bloom filter out of before reusing it > > verbatim in the current layer. The patch below the scissors line is > > sufficient to do that, and it causes the third test to start passing. > > I think there are three things we should keep in mind: > > 1. Incompatible Bloom filter settings between layers should be seen > as _inconsistent data_ as Git should not be writing incremental > commit-graph files with inconsistent Bloom filter settings. Thus, > when reading the commit-graph chain we should prevent incompatible > filters from being used. One way to do this is to notice different > settings and completely disable Bloom filters. The other way would > be to take the settings from the first layer with filters and then > clear the chunk_bloom_indexes and chunk_bloom_data fields for the > layers that don't agree. This fits with an expectation that lower > layers are larger, so more filters can be used in that situation. Sure; I'd be fine with only allowing filters computed with the settings present in the lowest or largest layer in the event that multiple layers exist with incompatible settings. I'm trying to point us towards a direction of not optimizing too far along a direction that we're unlikely to take, while also trying to do something relatively non-invasive to make it possible for a version of Git to change the default Bloom settings. That is, if a user is writing split commit-graphs, and we change the default Bloom settings, they shouldn't have to recompute or merge down all of their Bloom filters. If that's something that we never think is going to happen, I'm fine with not thinking too hard about it. But, I also don't want to paint ourselves into a corner, so I think something like the patch I wrote in the email that you're replying to actually may be worth pursuing further. I dunno. Definitely after 2.29, though. > 2. We should be sure that Git will not agree to write incompatible > settings between layers of a commit-graph chain. Even more, it > should definitely not re-use filters when merging layers with > incompatible filter values. The strategy above to ignore Bloom > filters in incompatible upper layers is enough of a guard against > the "merge layers" situation. Yeah, this would be fine with me. > 3. Allowing users (or future Git developers) to adjust the default > Bloom filter settings is one that is good to do for future-proofing, > but not one that I expect to be widely used (any gains here are > minuscule compared to the results already achieved with the defaults). > On top of that, including incompatible settings across layers is even > less likely to be a real use case. We should not be straining to make > the code even worse for this imaginary scenario. Right, I think we're pretty much in agreement here. Doing something easy to make sure that we don't run into a wall seems to make sense, but I think modifying the revision walk machinery to keep track of multiple Bloom keys computed with different settings corresponding to the set of Bloom filter settings in commit-graph layers is probably too far in that direction. For what it's worth, I was mainly talking about it to say that it would be more effort than it's probably worth to do. There's also nothing that we're currently discussing that would prevent us from taking that same direction up in six months from now. Thanks, Taylor
On 10/15/2020 11:18 PM, Taylor Blau wrote: > On Thu, Oct 15, 2020 at 10:18:47PM -0400, Derrick Stolee wrote: >> On 10/15/2020 5:41 PM, Taylor Blau wrote: >>> So, we need to be more aggressively checking the Bloom filter settings >>> in any layer we want to reuse a Bloom filter out of before reusing it >>> verbatim in the current layer. The patch below the scissors line is >>> sufficient to do that, and it causes the third test to start passing. >> >> I think there are three things we should keep in mind: >> >> 1. Incompatible Bloom filter settings between layers should be seen >> as _inconsistent data_ as Git should not be writing incremental >> commit-graph files with inconsistent Bloom filter settings. Thus, >> when reading the commit-graph chain we should prevent incompatible >> filters from being used. One way to do this is to notice different >> settings and completely disable Bloom filters. The other way would >> be to take the settings from the first layer with filters and then >> clear the chunk_bloom_indexes and chunk_bloom_data fields for the >> layers that don't agree. This fits with an expectation that lower >> layers are larger, so more filters can be used in that situation. > > Sure; I'd be fine with only allowing filters computed with the settings > present in the lowest or largest layer in the event that multiple layers > exist with incompatible settings. > > I'm trying to point us towards a direction of not optimizing too far > along a direction that we're unlikely to take, while also trying to do > something relatively non-invasive to make it possible for a version of > Git to change the default Bloom settings. That is, if a user is writing > split commit-graphs, and we change the default Bloom settings, they > shouldn't have to recompute or merge down all of their Bloom filters. They would need to recompute when they merge layers, which introduces a big question about how we should handle such a case. > If that's something that we never think is going to happen, I'm fine > with not thinking too hard about it. But, I also don't want to paint > ourselves into a corner, so I think something like the patch I wrote in > the email that you're replying to actually may be worth pursuing > further. I dunno. Definitely after 2.29, though. I think the proposed "react properly to this unlikely situation" is a good way to prevent getting locked into our choices now. It makes it possible for "old" Git versions (2.30 until we decide to allow this mix) to interact with the inconsistent settings without failure. We don't need to do the 100% "optimal" case of using all filters in order to enable this choice in the future. [...] > For what it's worth, I was mainly talking about it to say that it would > be more effort than it's probably worth to do. There's also nothing that > we're currently discussing that would prevent us from taking that same > direction up in six months from now. Yes, I just want to make sure that everyone agrees there is a middle ground without saying that inconsistent filter settings across layers is a "fully supported" feature. If someone wants to tackle the work to make it a desirable state, then they can try that (with great care). Thanks, -Stolee
diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index f4b13c005b..369b222b08 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -60,7 +60,10 @@ existing commit-graph file. With the `--changed-paths` option, compute and write information about the paths changed between a commit and it's first parent. This operation can take a while on large repositories. It provides significant performance gains -for getting history of a directory or a file with `git log -- <path>`. +for getting history of a directory or a file with `git log -- <path>`. If +this option is given, future commit-graph writes will automatically assume +that this option was intended. Use `--no-changed-paths` to stop storing this +data. + With the `--split` option, write the commit-graph as a chain of multiple commit-graph files stored in `<dir>/info/commit-graphs`. The new commits diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 59009837dc..ff7b177c33 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -151,6 +151,7 @@ static int graph_write(int argc, const char **argv) }; opts.progress = isatty(2); + opts.enable_changed_paths = -1; split_opts.size_multiple = 2; split_opts.max_commits = 0; split_opts.expire_time = 0; @@ -171,7 +172,9 @@ static int graph_write(int argc, const char **argv) flags |= COMMIT_GRAPH_WRITE_SPLIT; if (opts.progress) flags |= COMMIT_GRAPH_WRITE_PROGRESS; - if (opts.enable_changed_paths || + if (!opts.enable_changed_paths) + flags |= COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS; + if (opts.enable_changed_paths == 1 || git_env_bool(GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS, 0)) flags |= COMMIT_GRAPH_WRITE_BLOOM_FILTERS; diff --git a/commit-graph.c b/commit-graph.c index 50ce039a53..6762704324 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -16,6 +16,8 @@ #include "progress.h" #include "bloom.h" #include "commit-slab.h" +#include "json-writer.h" +#include "trace2.h" void git_test_write_commit_graph_or_die(void) { @@ -1108,6 +1110,21 @@ static void write_graph_chunk_bloom_indexes(struct hashfile *f, stop_progress(&progress); } +static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx) +{ + struct json_writer jw = JSON_WRITER_INIT; + + jw_object_begin(&jw, 0); + jw_object_intmax(&jw, "hash_version", ctx->bloom_settings->hash_version); + jw_object_intmax(&jw, "num_hashes", ctx->bloom_settings->num_hashes); + jw_object_intmax(&jw, "bits_per_entry", ctx->bloom_settings->bits_per_entry); + jw_end(&jw); + + trace2_data_json("bloom", ctx->r, "settings", &jw); + + jw_release(&jw); +} + static void write_graph_chunk_bloom_data(struct hashfile *f, struct write_commit_graph_context *ctx) { @@ -1116,6 +1133,8 @@ static void write_graph_chunk_bloom_data(struct hashfile *f, struct progress *progress = NULL; int i = 0; + trace2_bloom_filter_settings(ctx); + if (ctx->report_progress) progress = start_delayed_progress( _("Writing changed paths Bloom filters data"), @@ -1547,9 +1566,15 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) int num_chunks = 3; uint64_t chunk_offset; struct object_id file_hash; - const struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; + struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS; - ctx->bloom_settings = &bloom_settings; + if (!ctx->bloom_settings) { + bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", + bloom_settings.bits_per_entry); + bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", + bloom_settings.num_hashes); + ctx->bloom_settings = &bloom_settings; + } if (ctx->split) { struct strbuf tmp_file = STRBUF_INIT; @@ -1974,9 +1999,23 @@ int write_commit_graph(struct object_directory *odb, ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; ctx->check_oids = flags & COMMIT_GRAPH_WRITE_CHECK_OIDS ? 1 : 0; ctx->split_opts = split_opts; - ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0; ctx->total_bloom_filter_data_size = 0; + if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS) + ctx->changed_paths = 1; + if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { + struct commit_graph *g; + prepare_commit_graph_one(ctx->r, ctx->odb); + + g = ctx->r->objects->commit_graph; + + /* We have changed-paths already. Keep them in the next graph */ + if (g && g->chunk_bloom_data) { + ctx->changed_paths = 1; + ctx->bloom_settings = g->bloom_filter_settings; + } + } + if (ctx->split) { struct commit_graph *g; prepare_commit_graph(ctx->r); diff --git a/commit-graph.h b/commit-graph.h index f0fb13e3f2..45b1e5bca3 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -96,6 +96,7 @@ enum commit_graph_write_flags { /* Make sure that each OID in the input is a valid commit OID. */ COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3), COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 4), + COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS = (1 << 5), }; struct split_commit_graph_opts { diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index 2761208e74..52ad998f9e 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -126,7 +126,7 @@ test_expect_success 'setup - add commit-graph to the chain without Bloom filters test_commit c14 A/anotherFile2 && test_commit c15 A/B/anotherFile2 && test_commit c16 A/B/C/anotherFile2 && - GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 git commit-graph write --reachable --split && + git commit-graph write --reachable --split --no-changed-paths && test_line_count = 2 .git/objects/info/commit-graphs/commit-graph-chain ' @@ -152,6 +152,21 @@ test_expect_success 'Use Bloom filters if they exist in the latest but not all c test_bloom_filters_used_when_some_filters_are_missing "-- A/B" ' +test_expect_success 'persist filter settings' ' + test_when_finished rm -rf .git/objects/info/commit-graph* && + rm -rf .git/objects/info/commit-graph* && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + GIT_TRACE2_EVENT_NESTING=5 \ + GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=9 \ + GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY=15 \ + git commit-graph write --reachable --changed-paths && + grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2-auto.txt" \ + GIT_TRACE2_EVENT_NESTING=5 \ + git commit-graph write --reachable --changed-paths && + grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2-auto.txt +' + test_expect_success 'correctly report changes over limit' ' git init 513changes && (