mbox series

[00/15] bloom: changed-path Bloom filters v2

Message ID cover.1692654233.git.me@ttaylorr.com (mailing list archive)
Headers show
Series bloom: changed-path Bloom filters v2 | expand

Message

Taylor Blau Aug. 21, 2023, 9:43 p.m. UTC
This series combines the efforts in [1] and [2] to culminate in v2 of
the changed-path Bloom filter format (which uses a bona-fide murmur3
implementation) as well as an efficient upgrade path for repositories
with few non-ASCII paths.

The first seven patches are from [1], and are unchanged from that
version. Most of the remaining patches are from [2], and they have been
modified based on the review therein.

The final patch is new, and avoids leaking memory when the Bloom
subsystem is initialized both in the read- and read/write-cases.

Thanks to Jonathan and Peff who have both helped a great deal in putting
these patches together. And, as always, thanks in advance for your
review!

[1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@google.com/
[2]: https://lore.kernel.org/git/cover.1691426160.git.me@ttaylorr.com/

Jonathan Tan (4):
  gitformat-commit-graph: describe version 2 of BDAT
  t4216: test changed path filters with high bit paths
  repo-settings: introduce commitgraph.changedPathsVersion
  commit-graph: new filter ver. that fixes murmur3

Taylor Blau (11):
  t/helper/test-read-graph.c: extract `dump_graph_info()`
  bloom.h: make `load_bloom_filter_from_graph()` public
  t/helper/test-read-graph: implement `bloom-filters` mode
  bloom: annotate filters with hash version
  bloom: prepare to discard incompatible Bloom filters
  t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
  commit-graph.c: unconditionally load Bloom filters
  commit-graph: drop unnecessary `graph_read_bloom_data_context`
  object.h: fix mis-aligned flag bits table
  commit-graph: reuse existing Bloom filters where possible
  bloom: introduce `deinit_bloom_filters()`

 Documentation/config/commitgraph.txt     |  26 ++-
 Documentation/gitformat-commit-graph.txt |   9 +-
 bloom.c                                  | 208 +++++++++++++++++++++--
 bloom.h                                  |  38 ++++-
 commit-graph.c                           |  36 +++-
 object.h                                 |   3 +-
 oss-fuzz/fuzz-commit-graph.c             |   2 +-
 repo-settings.c                          |   6 +-
 repository.h                             |   2 +-
 t/helper/test-bloom.c                    |   9 +-
 t/helper/test-read-graph.c               |  67 ++++++--
 t/t0095-bloom.sh                         |   8 +
 t/t4216-log-bloom.sh                     | 184 +++++++++++++++++++-
 13 files changed, 548 insertions(+), 50 deletions(-)

Comments

Jonathan Tan Aug. 24, 2023, 10:22 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:
> Jonathan Tan (4):
>   gitformat-commit-graph: describe version 2 of BDAT
>   t4216: test changed path filters with high bit paths
>   repo-settings: introduce commitgraph.changedPathsVersion
>   commit-graph: new filter ver. that fixes murmur3
> 
> Taylor Blau (11):
>   t/helper/test-read-graph.c: extract `dump_graph_info()`
>   bloom.h: make `load_bloom_filter_from_graph()` public
>   t/helper/test-read-graph: implement `bloom-filters` mode
>   bloom: annotate filters with hash version
>   bloom: prepare to discard incompatible Bloom filters
>   t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
>   commit-graph.c: unconditionally load Bloom filters
>   commit-graph: drop unnecessary `graph_read_bloom_data_context`
>   object.h: fix mis-aligned flag bits table
>   commit-graph: reuse existing Bloom filters where possible
>   bloom: introduce `deinit_bloom_filters()`

Thanks. I had one small comment (sent as an email reply to one of the
patches), but everything else looks good.
Jonathan Tan Aug. 25, 2023, 5:06 p.m. UTC | #2
Jonathan Tan <jonathantanmy@google.com> writes:
> Taylor Blau <me@ttaylorr.com> writes:
> > Jonathan Tan (4):
> >   gitformat-commit-graph: describe version 2 of BDAT
> >   t4216: test changed path filters with high bit paths
> >   repo-settings: introduce commitgraph.changedPathsVersion
> >   commit-graph: new filter ver. that fixes murmur3
> > 
> > Taylor Blau (11):
> >   t/helper/test-read-graph.c: extract `dump_graph_info()`
> >   bloom.h: make `load_bloom_filter_from_graph()` public
> >   t/helper/test-read-graph: implement `bloom-filters` mode
> >   bloom: annotate filters with hash version
> >   bloom: prepare to discard incompatible Bloom filters
> >   t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
> >   commit-graph.c: unconditionally load Bloom filters
> >   commit-graph: drop unnecessary `graph_read_bloom_data_context`
> >   object.h: fix mis-aligned flag bits table
> >   commit-graph: reuse existing Bloom filters where possible
> >   bloom: introduce `deinit_bloom_filters()`
> 
> Thanks. I had one small comment (sent as an email reply to one of the
> patches), but everything else looks good.

I mistakenly sent my reply to an earlier version [1]. (Taylor has seen
it, so this note is more for future readers who might be curious about
what that email reply contains.)

[1] https://lore.kernel.org/git/20230824222051.2320003-1-jonathantanmy@google.com/
Jonathan Tan Aug. 29, 2023, 10:18 p.m. UTC | #3
Jonathan Tan <jonathantanmy@google.com> writes:
> I mistakenly sent my reply to an earlier version [1]. (Taylor has seen
> it, so this note is more for future readers who might be curious about
> what that email reply contains.)
> 
> [1] https://lore.kernel.org/git/20230824222051.2320003-1-jonathantanmy@google.com/

I think that this patch set is good for merging. There is a discussion
between Taylor and me at the link above in which Taylor thinks something
is OK and I think that that thing is OK except in a very specific
situation (there exists a commit-graph chain in which the different
layers have different Bloom filter versions). (Hopefully that's a good
summary of the discussion.) I think that the situation is narrow enough
and have seen that Taylor (whose capabilities I have a high opinion of)
has looked into it. I think the only way for me to be fully convinced
is to write a test that includes these different layers and see how the
code functions.

So I think that we can merge this patch set. An alternative would be
to wait for me (or someone else) to write such a test, but if it's just
me that's worried about this situation, it's probably not worth waiting
just for this.

The other outstanding thing is that Szeder Gabor pointed out that Bloom
filters are not applied to root commits so some of the tests don't
test what you would expect [2]. I've updated the tests and pushed the
results to GitHub [3]. I'm OK with using that version, or the current
version (in which case I'll resend the updated tests once this version
is merged).

[2] https://lore.kernel.org/git/20230826150610.GA1928@szeder.dev/
[3] https://github.com/jonathantanmy/git/tree/changedpath
Junio C Hamano Aug. 29, 2023, 11:16 p.m. UTC | #4
Jonathan Tan <jonathantanmy@google.com> writes:

> So I think that we can merge this patch set. An alternative would be
> to wait for me (or someone else) to write such a test, but if it's just
> me that's worried about this situation, it's probably not worth waiting
> just for this.
>
> The other outstanding thing is that Szeder Gabor pointed out that Bloom
> filters are not applied to root commits so some of the tests don't
> test what you would expect [2]. I've updated the tests and pushed the
> results to GitHub [3]. I'm OK with using that version, or the current
> version (in which case I'll resend the updated tests once this version
> is merged).
>
> [2] https://lore.kernel.org/git/20230826150610.GA1928@szeder.dev/
> [3] https://github.com/jonathantanmy/git/tree/changedpath

Thanks for a concise and well thought out summary.

If we know that some tests in the current round is not doing the
right thing, and we already have an updated set of tests to fix
them, I doubt that the topic is so urgent that merging a known to be
incomplete version is preferrable than seeing the hopefully final
version on the list that everybody can agree with.

Thanks.