mbox series

[0/2] Changed path filter hash fix and version bump

Message ID cover.1684790529.git.jonathantanmy@google.com (mailing list archive)
Headers show
Series Changed path filter hash fix and version bump | expand

Message

Jonathan Tan May 22, 2023, 9:48 p.m. UTC
Following the conversation in [1], here are patches to fix the murmur3
hash function used in creating (and interpreting) changed path filters,
and also to bump the version number to 2.

This is I think the simplest way to do this (invalidating all existing
changed path filters). The resource-consuming part of creating a changed
path filter is in computing the changed paths (thus, reading trees and
calculating changes), and to check if a changed path filter could be
reused, one would need to compute the changed paths anyway in order to
determine if any of them have high-bit strings, so I did not pursue this
further. Server operators might be able to reuse changed path filters
if, for example, they have a more efficient way to determine that no
paths in a repo have the high bit set, but I think that this is out of
scope for the Git project.

In patch 2, I couldn't figure out how to make Bash pass high-bit strings
as a CLI argument for some reason, so I hardcoded the string I wanted
in the test helper instead. If anyone knows how to pass such strings,
please let me know.

[1] https://lore.kernel.org/git/20230511224101.972442-1-jonathantanmy@google.com/

Jonathan Tan (2):
  t4216: test wrong bloom filter version rejection
  commit-graph: fix murmur3, bump filter ver. to 2

 bloom.c               | 14 +++++++-------
 bloom.h               |  9 ++++++---
 commit-graph.c        |  4 ++--
 t/helper/test-bloom.c |  7 +++++++
 t/t0095-bloom.sh      |  8 ++++++++
 t/t4216-log-bloom.sh  | 36 +++++++++++++++++++++++++++++++++---
 6 files changed, 63 insertions(+), 15 deletions(-)

Comments

Junio C Hamano May 23, 2023, 4:42 a.m. UTC | #1
Jonathan Tan <jonathantanmy@google.com> writes:

> Following the conversation in [1], here are patches to fix the murmur3
> hash function used in creating (and interpreting) changed path filters,
> and also to bump the version number to 2.

Wonderful.  Thanks for a quick update.  Will take a look when I come
back to the keyboard (I'm on half-vacation right now).

>
> This is I think the simplest way to do this (invalidating all existing
> changed path filters). The resource-consuming part of creating a changed
> path filter is in computing the changed paths (thus, reading trees and
> calculating changes), and to check if a changed path filter could be
> reused, one would need to compute the changed paths anyway in order to
> determine if any of them have high-bit strings, so I did not pursue this
> further. Server operators might be able to reuse changed path filters
> if, for example, they have a more efficient way to determine that no
> paths in a repo have the high bit set, but I think that this is out of
> scope for the Git project.
>
> In patch 2, I couldn't figure out how to make Bash pass high-bit strings
> as a CLI argument for some reason, so I hardcoded the string I wanted
> in the test helper instead. If anyone knows how to pass such strings,
> please let me know.
>
> [1] https://lore.kernel.org/git/20230511224101.972442-1-jonathantanmy@google.com/
>
> Jonathan Tan (2):
>   t4216: test wrong bloom filter version rejection
>   commit-graph: fix murmur3, bump filter ver. to 2
>
>  bloom.c               | 14 +++++++-------
>  bloom.h               |  9 ++++++---
>  commit-graph.c        |  4 ++--
>  t/helper/test-bloom.c |  7 +++++++
>  t/t0095-bloom.sh      |  8 ++++++++
>  t/t4216-log-bloom.sh  | 36 +++++++++++++++++++++++++++++++++---
>  6 files changed, 63 insertions(+), 15 deletions(-)