mbox series

[v2,0/3] Changed path filter hash fix and version bump

Message ID cover.1685574402.git.jonathantanmy@google.com (mailing list archive)
Headers show
Series Changed path filter hash fix and version bump | expand

Message

Jonathan Tan May 31, 2023, 11:12 p.m. UTC
Here's a new version. With this, Git can function with both version
1 (incorrect murmur3) and version 2 (correct murmur3) changed path
filters, but not at the same time: the user can set a config variable to
choose which one, and Git will ignore existing changed path filters of
the wrong version (and always write the version that the config variable
says).

In patch 1, the test assumes that char is signed. I'm not sure if it's
worth asserting on the contents of the filter, since it depends on
whether char is signed, but I've included it anyway (since it's easy
to remove).

Jonathan Tan (3):
  t4216: test changed path filters with high bit paths
  repo-settings: introduce commitgraph.changedPathsVersion
  commit-graph: new filter ver. that fixes murmur3

 Documentation/config/commitgraph.txt | 16 +++++--
 bloom.c                              | 65 ++++++++++++++++++++++++++--
 bloom.h                              |  8 ++--
 commit-graph.c                       | 29 ++++++++++---
 oss-fuzz/fuzz-commit-graph.c         |  2 +-
 repo-settings.c                      |  6 ++-
 repository.h                         |  2 +-
 t/helper/test-bloom.c                |  9 +++-
 t/t0095-bloom.sh                     |  8 ++++
 t/t4216-log-bloom.sh                 | 65 ++++++++++++++++++++++++++++
 10 files changed, 192 insertions(+), 18 deletions(-)

Range-diff against v1:
1:  3a5d53d3c0 < -:  ---------- t4216: test wrong bloom filter version rejection
2:  5a91f9682b < -:  ---------- commit-graph: fix murmur3, bump filter ver. to 2
-:  ---------- > 1:  175dc912fe t4216: test changed path filters with high bit paths
-:  ---------- > 2:  4a7553f3fb repo-settings: introduce commitgraph.changedPathsVersion
-:  ---------- > 3:  f5c3f6080a commit-graph: new filter ver. that fixes murmur3

Comments

Junio C Hamano June 3, 2023, 1:01 a.m. UTC | #1
Jonathan Tan <jonathantanmy@google.com> writes:

> Here's a new version. With this, Git can function with both version
> 1 (incorrect murmur3) and version 2 (correct murmur3) changed path
> filters, but not at the same time: the user can set a config variable to
> choose which one, and Git will ignore existing changed path filters of
> the wrong version (and always write the version that the config variable
> says).

Hmph.  On a system with unsigned char, we should be able to keep
using version 1 without losing correctness, I suspect, but probably
they are in the minority we do not have to care about?  I can see
the desire to simplify the migration plan (i.e. essentially have no
migration---this will give us just a flag day per repository), but
I'll let others to comment.

> In patch 1, the test assumes that char is signed. I'm not sure if it's
> worth asserting on the contents of the filter, since it depends on
> whether char is signed, but I've included it anyway (since it's easy
> to remove).

So, on a system with unsigned char, would these tests fail?  Do we
need a prereq to skip them?

Thanks.
Junio C Hamano June 3, 2023, 2:24 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>> Here's a new version. With this, Git can function with both version
>> 1 (incorrect murmur3) and version 2 (correct murmur3) changed path
>> filters, but not at the same time: the user can set a config variable to
>> choose which one, and Git will ignore existing changed path filters of
>> the wrong version (and always write the version that the config variable
>> says).

Seems to break t4216 when merged to 'seen' to replace the previous
round.  Could you take a look?
Jonathan Tan June 7, 2023, 4:30 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Jonathan Tan <jonathantanmy@google.com> writes:
> >
> >> Here's a new version. With this, Git can function with both version
> >> 1 (incorrect murmur3) and version 2 (correct murmur3) changed path
> >> filters, but not at the same time: the user can set a config variable to
> >> choose which one, and Git will ignore existing changed path filters of
> >> the wrong version (and always write the version that the config variable
> >> says).
> 
> Seems to break t4216 when merged to 'seen' to replace the previous
> round.  Could you take a look?

OK - I see that it fails CI when I upload the merge to GitHub (although
it passes locally). I'll take a look.
Jonathan Tan June 7, 2023, 9:37 p.m. UTC | #4
Jonathan Tan <jonathantanmy@google.com> writes:
> Junio C Hamano <gitster@pobox.com> writes:
> > Seems to break t4216 when merged to 'seen' to replace the previous
> > round.  Could you take a look?
> 
> OK - I see that it fails CI when I upload the merge to GitHub (although
> it passes locally). I'll take a look.

Hmm...so when the test writes a file with a high bit filename, the
filename written is different. Here's the output from my machine
(Linux):

        ++ git -C highbit1/ commit -m c1                                                                                                                          
        [main (root-commit) 808e481] c1                                                                                                                           
         Author: A U Thor <author@example.com>                                                                                                                    
         1 file changed, 1 insertion(+)                                                                                                                           
         create mode 100644 "\302\242"                                                                                                                            
        ++ case "$tag" in                                                                                                                                         
        ++ git -C highbit1/ tag c1                                                                                                                                
        ++ touch $'\302\242\302\242'                                                                                                                              
        ++ ls                                                                                                                                                     
         A        expect   file5_renamed   limits        log_wo_bloom   trace2-auto.txt  ''$'\302\242\302\242'                                                    
         actual   file4    highbit1        log_w_bloom   trace.perf     trace2.txt                                                                                
        ++ git -C highbit1 commit-graph write --reachable --changed-paths                                                                                         
        bloom calculating -62,-94 version 1                                                                                                                       
        ok 141 - set up repo with high bit path, version 1 changed-path       

and here's the output from the CI on GitHub on linux-gcc-default:

        [main (root-commit) 3f37a43] c1
         Author: A U Thor <author@example.com>
         1 file changed, 1 insertion(+)
         create mode 100644 "\\xc2\\xa2"
        + git -C highbit1/ tag c1
        + touch \xc2\xa2\xc2\xa2
        + ls
        A
        \xc2\xa2\xc2\xa2
        actual
        expect
        file4
        file5_renamed
        highbit1
        limits
        log_w_bloom
        log_wo_bloom
        trace.perf
        trace2-auto.txt
        trace2.txt
        + git -C highbit1 commit-graph write --reachable --changed-paths
        bloom calculating 92,120 version 1

        ok 141 - set up repo with high bit path, version 1 changed-path

These were run with some extra code, here for completeness:

        diff --git a/bloom.c b/bloom.c
        index dea883d8d6..ccc3e0ce80 100644
        --- a/bloom.c
        +++ b/bloom.c
        @@ -192,6 +192,8 @@ void fill_bloom_key(const char *data,
                        ? murmur3_seeded_v2 : murmur3_seeded_v1)(seed0, data, len);
                const uint32_t hash1 = (settings->hash_version == 2
                        ? murmur3_seeded_v2 : murmur3_seeded_v1)(seed1, data, len);
        +       if (len > 0)
        +               fprintf(stderr, "bloom calculating %d,%d version %d\n", data[0], data[1], settings->hash_version);
 
                key->hashes = (uint32_t *)xcalloc(settings->num_hashes, sizeof(uint32_t));
                for (i = 0; i < settings->num_hashes; i++)
        diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
        index 764c6dee0f..1121e7d4cd 100755
        --- a/t/t4216-log-bloom.sh
        +++ b/t/t4216-log-bloom.sh
        @@ -423,6 +423,8 @@ CENT=$(printf "\xc2\xa2")
         test_expect_success 'set up repo with high bit path, version 1 changed-path' '
                git init highbit1 &&
                test_commit -C highbit1 c1 "$CENT" &&
        +       touch $CENT$CENT &&
        +       ls &&
                git -C highbit1 commit-graph write --reachable --changed-paths
         '

Notice the different filename after "create mode", and the different
"ls" output. I'll investigate some more, but if anyone offhand knows
what's going on (and/or knows how to write a high-bit filename portably,
even across  Linux), please let me know.

An alternative is to drop the tests from these patches - so, leave them
in during the review period and reviewers would see that the tests pass
the CI jobs for Windows and Mac OS X pass, and then before we merge it,
delete the tests from the patches. This also solves needing to prevent
unsigned char platforms from running the version 1 tests - there's no
prereq for that yet and we would have to investigate making one.