mbox series

[v7,00/16] bloom: changed-path Bloom filters v2 (& sundries)

Message ID cover.1719333276.git.me@ttaylorr.com (mailing list archive)
Headers show
Series bloom: changed-path Bloom filters v2 (& sundries) | expand

Message

Taylor Blau June 25, 2024, 5:39 p.m. UTC
(Rebased onto the tip of 'master', which is now 1e1586e4ed (The
sixteenth batch, 2024-06-24) at the time of writing).

This series is another minor reroll of the combined efforts of [1] and
[2] to introduce the v2 changed-path Bloom filters, which fixes a bug in
our existing implementation of murmur3 paths with non-ASCII characters
(when the "char" type is signed).

This version addresses the remaining comments from SZEDER around more
thorough testing of merging commit-graph layers with incompatible Bloom
filters versions, and ensuring the result is as expected.

Thanks again to Jonathan, Peff, and SZEDER who have helped a great deal
in assembling these patches. As usual, a range-diff is included below.

Thanks in advance for your review!

[1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@google.com/
[2]: https://lore.kernel.org/git/cover.1691426160.git.me@ttaylorr.com/

Jonathan Tan (1):
  gitformat-commit-graph: describe version 2 of BDAT

Taylor Blau (15):
  t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
  revision.c: consult Bloom filters for root commits
  commit-graph: ensure Bloom filters are read with consistent settings
  t/helper/test-read-graph.c: extract `dump_graph_info()`
  bloom.h: make `load_bloom_filter_from_graph()` public
  t/helper/test-read-graph: implement `bloom-filters` mode
  t4216: test changed path filters with high bit paths
  repo-settings: introduce commitgraph.changedPathsVersion
  bloom: annotate filters with hash version
  bloom: prepare to discard incompatible Bloom filters
  commit-graph: unconditionally load Bloom filters
  commit-graph: new Bloom filter version that fixes murmur3
  object.h: fix mis-aligned flag bits table
  commit-graph: reuse existing Bloom filters where possible
  bloom: introduce `deinit_bloom_filters()`

 Documentation/config/commitgraph.txt     |  29 +-
 Documentation/gitformat-commit-graph.txt |   9 +-
 bloom.c                                  | 208 ++++++++++++++-
 bloom.h                                  |  38 ++-
 commit-graph.c                           |  64 ++++-
 object.h                                 |   3 +-
 oss-fuzz/fuzz-commit-graph.c             |   2 +-
 repo-settings.c                          |   6 +-
 repository.h                             |   2 +-
 revision.c                               |  26 +-
 t/helper/test-bloom.c                    |   9 +-
 t/helper/test-read-graph.c               |  67 ++++-
 t/t0095-bloom.sh                         |   8 +
 t/t4216-log-bloom.sh                     | 325 ++++++++++++++++++++++-
 14 files changed, 738 insertions(+), 58 deletions(-)

Range-diff against v6:
 1:  9df34a2f4f =  1:  ee651fee33 t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
 2:  a6dc377f1b =  2:  5d88ad6c90 revision.c: consult Bloom filters for root commits
 3:  a77ab941bc !  3:  f6cf5bfc4e commit-graph: ensure Bloom filters are read with consistent settings
    @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm
     +test_expect_success 'merge graph layers with incompatible Bloom settings' '
     +	# Ensure that incompatible Bloom filters are ignored when
     +	# merging existing layers.
    -+	git -C $repo commit-graph write --reachable --changed-paths 2>err &&
    ++	>trace2.txt &&
    ++	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
    ++		git -C $repo commit-graph write --reachable --changed-paths 2>err &&
     +	grep "disabling Bloom filters for commit-graph layer .$layer." err &&
    ++	grep "{\"hash_version\":1,\"num_hashes\":7,\"bits_per_entry\":10,\"max_changed_paths\":512" trace2.txt &&
     +
     +	test_path_is_file $repo/$graph &&
     +	test_dir_is_empty $repo/$graphdir &&
 4:  56a9fdaff0 =  4:  0041600f31 gitformat-commit-graph: describe version 2 of BDAT
 5:  7484a82f7f =  5:  6e7f317551 t/helper/test-read-graph.c: extract `dump_graph_info()`
 6:  48343f93a2 =  6:  ae74fbad3e bloom.h: make `load_bloom_filter_from_graph()` public
 7:  286fd7dcdb =  7:  0dfd1a361e t/helper/test-read-graph: implement `bloom-filters` mode
 8:  7de7b89da0 =  8:  fbcaa686b1 t4216: test changed path filters with high bit paths
 9:  b13c9b8ff9 !  9:  60c063ca4a repo-settings: introduce commitgraph.changedPathsVersion
    @@ commit-graph.c: static void validate_mixed_bloom_settings(struct commit_graph *g
     
      ## oss-fuzz/fuzz-commit-graph.c ##
     @@ oss-fuzz/fuzz-commit-graph.c: int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
    - 	 * possible.
      	 */
    + 	repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
      	the_repository->settings.commit_graph_generation_version = 2;
     -	the_repository->settings.commit_graph_read_changed_paths = 1;
     +	the_repository->settings.commit_graph_changed_paths_version = 1;
10:  09c44c51a5 = 10:  ce3a15a517 bloom: annotate filters with hash version
11:  d4995ef600 = 11:  99ab9cf448 bloom: prepare to discard incompatible Bloom filters
12:  c8e9bb7c88 = 12:  99e66d1dba commit-graph: unconditionally load Bloom filters
13:  d2f11c082d ! 13:  2e945c3d2e commit-graph: new Bloom filter version that fixes murmur3
    @@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
     +	if (r->settings.commit_graph_changed_paths_version < -1
     +	    || r->settings.commit_graph_changed_paths_version > 2) {
     +		warning(_("attempting to write a commit-graph, but "
    -+			  "'commitgraph.changedPathsVersion' (%d) is not supported"),
    ++			  "'commitGraph.changedPathsVersion' (%d) is not supported"),
     +			r->settings.commit_graph_changed_paths_version);
     +		return 0;
     +	}
    @@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible
      	test_must_be_empty err
      '
      
    ++# chosen to be the same under all Unicode normalization forms
    ++CENT=$(printf "\302\242")
    ++
     +test_expect_success 'ensure Bloom filter with incompatible versions are ignored' '
     +	rm "$repo/$graph" &&
     +
    @@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible
     +	cat >expect.err <<-EOF &&
     +	warning: disabling Bloom filters for commit-graph layer $SQ$layer$SQ due to incompatible settings
     +	EOF
    -+	test_cmp expect.err err
    ++	test_cmp expect.err err &&
    ++
    ++	# Merge the two layers with incompatible bloom filter versions,
    ++	# ensuring that the v2 filters are used.
    ++	>trace2.txt &&
    ++	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
    ++		git -C $repo -c commitGraph.changedPathsVersion=2 commit-graph write --reachable --changed-paths 2>err &&
    ++	grep "disabling Bloom filters for commit-graph layer .$layer." err &&
    ++	grep "{\"hash_version\":2,\"num_hashes\":7,\"bits_per_entry\":10,\"max_changed_paths\":512" trace2.txt
     +'
     +
      get_first_changed_path_filter () {
      	test-tool read-graph bloom-filters >filters.dat &&
      	head -n 1 filters.dat
    + }
    + 
    +-# chosen to be the same under all Unicode normalization forms
    +-CENT=$(printf "\302\242")
    +-
    + test_expect_success 'set up repo with high bit path, version 1 changed-path' '
    + 	git init highbit1 &&
    + 	test_commit -C highbit1 c1 "$CENT" &&
     @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when version 1 requested' '
      	)
      '
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +test_expect_success 'version 1 changed-path not used when version 2 requested' '
     +	(
     +		cd highbit1 &&
    -+		git config --add commitgraph.changedPathsVersion 2 &&
    ++		git config --add commitGraph.changedPathsVersion 2 &&
     +		test_bloom_filters_not_used "-- another$CENT"
     +	)
     +'
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +test_expect_success 'version 1 changed-path used when autodetect requested' '
     +	(
     +		cd highbit1 &&
    -+		git config --add commitgraph.changedPathsVersion -1 &&
    ++		git config --add commitGraph.changedPathsVersion -1 &&
     +		test_bloom_filters_used "-- another$CENT"
     +	)
     +'
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +	git -C highbit1 commit-graph write --reachable --changed-paths &&
     +	(
     +		cd highbit1 &&
    -+		git config --add commitgraph.changedPathsVersion -1 &&
    ++		git config --add commitGraph.changedPathsVersion -1 &&
     +		echo "options: bloom(1,10,7) read_generation_data" >expect &&
     +		test-tool read-graph >full &&
     +		grep options full >actual &&
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +
     +test_expect_success 'set up repo with high bit path, version 2 changed-path' '
     +	git init highbit2 &&
    -+	git -C highbit2 config --add commitgraph.changedPathsVersion 2 &&
    ++	git -C highbit2 config --add commitGraph.changedPathsVersion 2 &&
     +	test_commit -C highbit2 c2 "$CENT" &&
     +	git -C highbit2 commit-graph write --reachable --changed-paths
     +'
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +test_expect_success 'version 2 changed-path not used when version 1 requested' '
     +	(
     +		cd highbit2 &&
    -+		git config --add commitgraph.changedPathsVersion 1 &&
    ++		git config --add commitGraph.changedPathsVersion 1 &&
     +		test_bloom_filters_not_used "-- another$CENT"
     +	)
     +'
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +test_expect_success 'version 2 changed-path used when autodetect requested' '
     +	(
     +		cd highbit2 &&
    -+		git config --add commitgraph.changedPathsVersion -1 &&
    ++		git config --add commitGraph.changedPathsVersion -1 &&
     +		test_bloom_filters_used "-- another$CENT"
     +	)
     +'
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +	git -C highbit2 commit-graph write --reachable --changed-paths &&
     +	(
     +		cd highbit2 &&
    -+		git config --add commitgraph.changedPathsVersion -1 &&
    ++		git config --add commitGraph.changedPathsVersion -1 &&
     +		echo "options: bloom(2,10,7) read_generation_data" >expect &&
     +		test-tool read-graph >full &&
     +		grep options full >actual &&
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +test_expect_success 'when writing commit graph, do not reuse changed-path of another version' '
     +	git init doublewrite &&
     +	test_commit -C doublewrite c "$CENT" &&
    -+	git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
    ++	git -C doublewrite config --add commitGraph.changedPathsVersion 1 &&
     +	git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	for v in -2 3
     +	do
    -+		git -C doublewrite config --add commitgraph.changedPathsVersion $v &&
    ++		git -C doublewrite config --add commitGraph.changedPathsVersion $v &&
     +		git -C doublewrite commit-graph write --reachable --changed-paths 2>err &&
     +		cat >expect <<-EOF &&
    -+		warning: attempting to write a commit-graph, but ${SQ}commitgraph.changedPathsVersion${SQ} ($v) is not supported
    ++		warning: attempting to write a commit-graph, but ${SQ}commitGraph.changedPathsVersion${SQ} ($v) is not supported
     +		EOF
     +		test_cmp expect err || return 1
     +	done &&
    -+	git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
    ++	git -C doublewrite config --add commitGraph.changedPathsVersion 2 &&
     +	git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	(
     +		cd doublewrite &&
14:  9f54a376fb = 14:  242f023135 object.h: fix mis-aligned flag bits table
15:  67991dea7c ! 15:  1b80023e57 commit-graph: reuse existing Bloom filters where possible
    @@ bloom.c: static void init_truncated_large_filter(struct bloom_filter *filter,
     +		struct tree_desc desc;
     +		struct name_entry entry;
     +
    -+		init_tree_desc(&desc, t->buffer, t->size);
    ++		init_tree_desc(&desc, &t->object.oid, t->buffer, t->size);
     +		while (tree_entry(&desc, &entry)) {
     +			size_t i;
     +			for (i = 0; i < entry.pathlen; i++) {
    @@ t/t4216-log-bloom.sh: test_expect_success 'when writing another commit graph, pr
      	git init doublewrite &&
      	test_commit -C doublewrite c "$CENT" &&
     +
    - 	git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
    + 	git -C doublewrite config --add commitGraph.changedPathsVersion 1 &&
    ++	>trace2.txt &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +		git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	test_filter_computed 1 trace2.txt &&
    @@ t/t4216-log-bloom.sh: test_expect_success 'when writing commit graph, do not reu
      		test_cmp expect err || return 1
      	done &&
     +
    - 	git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
    + 	git -C doublewrite config --add commitGraph.changedPathsVersion 2 &&
     -	git -C doublewrite commit-graph write --reachable --changed-paths &&
    ++	>trace2.txt &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +		git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	test_filter_computed 1 trace2.txt &&
    @@ t/t4216-log-bloom.sh: test_expect_success 'when writing commit graph, do not reu
     +
     +	test_commit -C upgrade base no-high-bits &&
     +
    -+	git -C upgrade config --add commitgraph.changedPathsVersion 1 &&
    ++	git -C upgrade config --add commitGraph.changedPathsVersion 1 &&
    ++	>trace2.txt &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +		git -C upgrade commit-graph write --reachable --changed-paths &&
     +	test_filter_computed 1 trace2.txt &&
     +	test_filter_upgraded 0 trace2.txt &&
     +
    -+	git -C upgrade config --add commitgraph.changedPathsVersion 2 &&
    ++	git -C upgrade config --add commitGraph.changedPathsVersion 2 &&
    ++	>trace2.txt &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +		git -C upgrade commit-graph write --reachable --changed-paths &&
     +	test_filter_computed 0 trace2.txt &&
16:  12058a074d = 16:  db9991f339 bloom: introduce `deinit_bloom_filters()`

base-commit: 1e1586e4ed626bde864339c10570bc0e73f0ab97