mbox series

[v4,00/17] bloom: changed-path Bloom filters v2 (& sundries)

Message ID cover.1697653929.git.me@ttaylorr.com (mailing list archive)
Headers show
Series bloom: changed-path Bloom filters v2 (& sundries) | expand

Message

Taylor Blau Oct. 18, 2023, 6:32 p.m. UTC
(Rebased onto the tip of 'master', which is 3a06386e31 (The fifteenth
batch, 2023-10-04), at the time of writing).

This series is a reroll of the combined efforts of [1] and [2] to
introduce the v2 changed-path Bloom filters, which fixes a bug in our
existing implementation of murmur3 paths with non-ASCII characters (when
the "char" type is signed).

In large part, this is the same as the previous round. But this round
includes some extra bits that address issues pointed out by SZEDER
Gábor, which are:

  - not reading Bloom filters for root commits
  - corrupting Bloom filter reads by tweaking the filter settings
    between layers.

These issues were discussed in (among other places) [3], and [4],
respectively.

Thanks to Jonathan, Peff, and SZEDER who have helped a great deal in
assembling these patches. As usual, a range-diff is included below.
Thanks in advance for your
review!

[1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@google.com/
[2]: https://lore.kernel.org/git/cover.1691426160.git.me@ttaylorr.com/
[3]: https://public-inbox.org/git/20201015132147.GB24954@szeder.dev/
[4]: https://lore.kernel.org/git/20230830200218.GA5147@szeder.dev/

Jonathan Tan (4):
  gitformat-commit-graph: describe version 2 of BDAT
  t4216: test changed path filters with high bit paths
  repo-settings: introduce commitgraph.changedPathsVersion
  commit-graph: new filter ver. that fixes murmur3

Taylor Blau (13):
  t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
  revision.c: consult Bloom filters for root commits
  commit-graph: ensure Bloom filters are read with consistent settings
  t/helper/test-read-graph.c: extract `dump_graph_info()`
  bloom.h: make `load_bloom_filter_from_graph()` public
  t/helper/test-read-graph: implement `bloom-filters` mode
  bloom: annotate filters with hash version
  bloom: prepare to discard incompatible Bloom filters
  commit-graph.c: unconditionally load Bloom filters
  commit-graph: drop unnecessary `graph_read_bloom_data_context`
  object.h: fix mis-aligned flag bits table
  commit-graph: reuse existing Bloom filters where possible
  bloom: introduce `deinit_bloom_filters()`

 Documentation/config/commitgraph.txt     |  26 ++-
 Documentation/gitformat-commit-graph.txt |   9 +-
 bloom.c                                  | 208 ++++++++++++++++-
 bloom.h                                  |  38 ++-
 commit-graph.c                           |  61 ++++-
 object.h                                 |   3 +-
 oss-fuzz/fuzz-commit-graph.c             |   2 +-
 repo-settings.c                          |   6 +-
 repository.h                             |   2 +-
 revision.c                               |  26 ++-
 t/helper/test-bloom.c                    |   9 +-
 t/helper/test-read-graph.c               |  67 ++++--
 t/t0095-bloom.sh                         |   8 +
 t/t4216-log-bloom.sh                     | 282 ++++++++++++++++++++++-
 14 files changed, 692 insertions(+), 55 deletions(-)

Range-diff against v3:
 1:  fe671d616c =  1:  e0fc51c3fb t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
 2:  7d0fa93543 =  2:  87b09e6266 revision.c: consult Bloom filters for root commits
 3:  2ecc0a2d58 !  3:  46d8a41005 commit-graph: ensure Bloom filters are read with consistent settings
    @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm
     +	done
     +'
     +
    -+test_expect_success 'split' '
    ++test_expect_success 'ensure incompatible Bloom filters are ignored' '
     +	# Compute Bloom filters with "unusual" settings.
     +	git -C $repo rev-parse one >in &&
     +	GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git -C $repo commit-graph write \
    @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm
     +
     +test_expect_success 'merge graph layers with incompatible Bloom settings' '
     +	# Ensure that incompatible Bloom filters are ignored when
    -+	# generating new layers.
    ++	# merging existing layers.
     +	git -C $repo commit-graph write --reachable --changed-paths 2>err &&
     +	grep "disabling Bloom filters for commit-graph layer .$layer." err &&
     +
     +	test_path_is_file $repo/$graph &&
     +	test_dir_is_empty $repo/$graphdir &&
     +
    -+	# ...and merging existing ones.
    -+	git -C $repo -c core.commitGraph=false log --oneline --no-decorate -- file \
    -+		>expect 2>err &&
    -+	GIT_TRACE2_PERF="$(pwd)/trace.perf" \
    ++	git -C $repo -c core.commitGraph=false log --oneline --no-decorate -- \
    ++		file >expect &&
    ++	trace_out="$(pwd)/trace.perf" &&
    ++	GIT_TRACE2_PERF="$trace_out" \
     +		git -C $repo log --oneline --no-decorate -- file >actual 2>err &&
     +
    -+	test_cmp expect actual && cat err &&
    -+	grep "statistics:{\"filter_not_present\":0" trace.perf &&
    -+	! grep "disabling Bloom filters" err
    ++	test_cmp expect actual &&
    ++	grep "statistics:{\"filter_not_present\":0," trace.perf &&
    ++	test_must_be_empty err
     +'
     +
      test_done
 4:  17703ed89a =  4:  4d0190a992 gitformat-commit-graph: describe version 2 of BDAT
 5:  94552abf45 =  5:  3c2057c11c t/helper/test-read-graph.c: extract `dump_graph_info()`
 6:  3d81efa27b =  6:  e002e35004 bloom.h: make `load_bloom_filter_from_graph()` public
 7:  d23cd89037 =  7:  c7016f51cd t/helper/test-read-graph: implement `bloom-filters` mode
 8:  cba766f224 !  8:  cef2aac8ba t4216: test changed path filters with high bit paths
    @@ Commit message
     
      ## t/t4216-log-bloom.sh ##
     @@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible Bloom settings' '
    - 	! grep "disabling Bloom filters" err
    + 	test_must_be_empty err
      '
      
     +get_first_changed_path_filter () {
    @@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible
     +	(
     +		cd highbit1 &&
     +		echo "52a9" >expect &&
    -+		get_first_changed_path_filter >actual &&
    -+		test_cmp expect actual
    ++		get_first_changed_path_filter >actual
     +	)
     +'
     +
 9:  a08a961f41 =  9:  36d4e2202e repo-settings: introduce commitgraph.changedPathsVersion
10:  61d44519a5 ! 10:  f6ab427ead commit-graph: new filter ver. that fixes murmur3
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +	test_commit -C doublewrite c "$CENT" &&
     +	git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
     +	git -C doublewrite commit-graph write --reachable --changed-paths &&
    ++	for v in -2 3
    ++	do
    ++		git -C doublewrite config --add commitgraph.changedPathsVersion $v &&
    ++		git -C doublewrite commit-graph write --reachable --changed-paths 2>err &&
    ++		cat >expect <<-EOF &&
    ++		warning: attempting to write a commit-graph, but ${SQ}commitgraph.changedPathsVersion${SQ} ($v) is not supported
    ++		EOF
    ++		test_cmp expect err || return 1
    ++	done &&
     +	git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
     +	git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	(
11:  a8c10f8de8 = 11:  dc69b28329 bloom: annotate filters with hash version
12:  2ba10a4b4b = 12:  85dbdc4ed2 bloom: prepare to discard incompatible Bloom filters
13:  09d8669c3a = 13:  3ff669a622 commit-graph.c: unconditionally load Bloom filters
14:  0d4f9dc4ee = 14:  1c78e3d178 commit-graph: drop unnecessary `graph_read_bloom_data_context`
15:  1f7f27bc47 = 15:  a289514faa object.h: fix mis-aligned flag bits table
16:  abbef95ae8 ! 16:  6a12e39e7f commit-graph: reuse existing Bloom filters where possible
    @@ t/t4216-log-bloom.sh: test_expect_success 'when writing another commit graph, pr
      	test_commit -C doublewrite c "$CENT" &&
     +
      	git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
    --	git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +		git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	test_filter_computed 1 trace2.txt &&
     +	test_filter_upgraded 0 trace2.txt &&
    ++
    + 	git -C doublewrite commit-graph write --reachable --changed-paths &&
    + 	for v in -2 3
    + 	do
    +@@ t/t4216-log-bloom.sh: test_expect_success 'when writing commit graph, do not reuse changed-path of ano
    + 		EOF
    + 		test_cmp expect err || return 1
    + 	done &&
     +
      	git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
     -	git -C doublewrite commit-graph write --reachable --changed-paths &&
17:  ca362408d5 ! 17:  8942f205c8 bloom: introduce `deinit_bloom_filters()`
    @@ bloom.h: void add_key_to_filter(const struct bloom_key *key,
      	BLOOM_NOT_COMPUTED = (1 << 0),
     
      ## commit-graph.c ##
    -@@ commit-graph.c: static void close_commit_graph_one(struct commit_graph *g)
    +@@ commit-graph.c: struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r)
      void close_commit_graph(struct raw_object_store *o)
      {
    - 	close_commit_graph_one(o->commit_graph);
    + 	clear_commit_graph_data_slab(&commit_graph_data_slab);
     +	deinit_bloom_filters();
    + 	free_commit_graph(o->commit_graph);
      	o->commit_graph = NULL;
      }
    - 
     @@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
      
      	res = write_commit_graph_file(ctx);

Comments

Junio C Hamano Oct. 18, 2023, 11:26 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> (Rebased onto the tip of 'master', which is 3a06386e31 (The fifteenth
> batch, 2023-10-04), at the time of writing).

Judging from 17/17 that has a free_commit_graph() call in
close_commit_graph(), that was merged in the eighteenth batch,
the above is probably untrue.  I'll apply to the current master and
see how it goes instead.

> Thanks to Jonathan, Peff, and SZEDER who have helped a great deal in
> assembling these patches. As usual, a range-diff is included below.
> Thanks in advance for your
> review!

Thanks.
Taylor Blau Oct. 20, 2023, 5:27 p.m. UTC | #2
On Wed, Oct 18, 2023 at 04:26:48PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > (Rebased onto the tip of 'master', which is 3a06386e31 (The fifteenth
> > batch, 2023-10-04), at the time of writing).
>
> Judging from 17/17 that has a free_commit_graph() call in
> close_commit_graph(), that was merged in the eighteenth batch,
> the above is probably untrue.  I'll apply to the current master and
> see how it goes instead.

Worse than that, I sent this `--in-reply-to` the wrong thread :-<.

Sorry about that, and indeed you are right that the correct base for
this round should be a9ecda2788 (The eighteenth batch, 2023-10-13).

I'm optimistic that with the amount of careful review that this topic
has already received, that this round should do the trick. But if there
are more comments and we end up re-rolling it, I'll break this thread
and split out the v5 into it's thread to avoid further confusion.

> > Thanks to Jonathan, Peff, and SZEDER who have helped a great deal in
> > assembling these patches. As usual, a range-diff is included below.
> > Thanks in advance for your
> > review!
>
> Thanks.

Thank you, and sorry for the mistake on my end.

Thanks,
Taylor
SZEDER Gábor Oct. 23, 2023, 8:22 p.m. UTC | #3
On Fri, Oct 20, 2023 at 01:27:00PM -0400, Taylor Blau wrote:
> On Wed, Oct 18, 2023 at 04:26:48PM -0700, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > > (Rebased onto the tip of 'master', which is 3a06386e31 (The fifteenth
> > > batch, 2023-10-04), at the time of writing).
> >
> > Judging from 17/17 that has a free_commit_graph() call in
> > close_commit_graph(), that was merged in the eighteenth batch,
> > the above is probably untrue.  I'll apply to the current master and
> > see how it goes instead.
> 
> Worse than that, I sent this `--in-reply-to` the wrong thread :-<.
> 
> Sorry about that, and indeed you are right that the correct base for
> this round should be a9ecda2788 (The eighteenth batch, 2023-10-13).
> 
> I'm optimistic that with the amount of careful review that this topic
> has already received, that this round should do the trick.

Unfortunately, I can't share this optimism.  This series still lacks
tests exercising the interaction of different versions of Bloom
filters and split commit graphs, and the one such test that I sent a
while ago demonstrates that it's still broken.  And it's getting
worse: back then I didn't send the related test that merged
commit-graph layers containing different Bloom filter versions,
because happened to succeed even back then; but, alas, with this
series even that test fails.
Taylor Blau Oct. 30, 2023, 8:24 p.m. UTC | #4
On Mon, Oct 23, 2023 at 10:22:12PM +0200, SZEDER Gábor wrote:
> On Fri, Oct 20, 2023 at 01:27:00PM -0400, Taylor Blau wrote:
> > I'm optimistic that with the amount of careful review that this topic
> > has already received, that this round should do the trick.
>
> Unfortunately, I can't share this optimism.  This series still lacks
> tests exercising the interaction of different versions of Bloom
> filters and split commit graphs, and the one such test that I sent a
> while ago demonstrates that it's still broken.  And it's getting
> worse: back then I didn't send the related test that merged
> commit-graph layers containing different Bloom filter versions,
> because happened to succeed even back then; but, alas, with this
> series even that test fails.

I am very confused here, the tests that you're referring to have been
added to (and pass in) this series. What am I missing here?

Thanks,
Taylor