t0021: make sure clean filter runs

Message ID	20190820065625.128130-1-t.gummerer@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=P9dE=WQ=vger.kernel.org=git-owner@kernel.org> From: Thomas Gummerer <t.gummerer@gmail.com> To: git@vger.kernel.org Cc: rsbecker@nexbridge.com, johannes.schindelin@gmx.de, larsxschneider@gmail.com, szeder.dev@gmail.com, Thomas Gummerer <t.gummerer@gmail.com> Subject: [PATCH] t0021: make sure clean filter runs Date: Tue, 20 Aug 2019 07:56:25 +0100 Message-Id: <20190820065625.128130-1-t.gummerer@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	t0021: make sure clean filter runs \| expand t0021: make sure clean filter runs

Thomas Gummerer Aug. 20, 2019, 6:56 a.m. UTC

In t0021.15 one of the things we are checking is that the clean filter
is run when checking out empty-branch.  The clean filter needs to be
run to make sure there are no modifications on the file system for the
test.r file, and thus it isn't dangerous to overwrite it.

However in the current test setup it is not always necessary to run
the clean filter, and thus the test sometimes fails, as debug.log
isn't written.

This happens when test.r has an older mtime than the index itself.
That mtime is also recorded as stat data for test.r in the index, and
based on the heuristic we're using for index entries, git correctly
assumes this file is up-to-date.

Usually this test succeeds because the mtime of test.r is the same as
the mtime of the index.  In this case test.r is racily clean, so git
actually checks the contents, for which the clean filter is run.

Fix the test by updating the mtime of test.r, so git is forced to
check the contents of the file, and the clean filter is run as the
test expects.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 t/t0021-conversion.sh | 1 +
 1 file changed, 1 insertion(+)

Junio C Hamano Aug. 20, 2019, 6:01 p.m. UTC | #1

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Fix the test by updating the mtime of test.r, so git is forced to
> check the contents of the file, and the clean filter is run as the
> test expects.

Hmph, depending on the timestamp granularity, with this patch,
test.r would have mtime that is the same or a bit later than that of
the index file.  Is it sufficient to really "force" Git to check the
contents, or does it just make the likelyhood that it would choose
to check a bit bigger (in other words, are we solving the race, or
merely making the race window smaller)?

Thanks.

>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  t/t0021-conversion.sh | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> index e10f5f787f..66f75005d5 100755
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -390,6 +390,7 @@ test_expect_success PERL 'required process filter should filter data' '
>  		EOF
>  		test_cmp_exclude_clean expected.log debug.log &&
>  
> +		touch test.r &&
>  		filter_git checkout --quiet --no-progress empty-branch &&
>  		cat >expected.log <<-EOF &&
>  			START

Johannes Sixt Aug. 20, 2019, 7:11 p.m. UTC | #2

Am 20.08.19 um 08:56 schrieb Thomas Gummerer:
> Fix the test by updating the mtime of test.r, ...

> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> index e10f5f787f..66f75005d5 100755
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -390,6 +390,7 @@ test_expect_success PERL 'required process filter should filter data' '
>  		EOF
>  		test_cmp_exclude_clean expected.log debug.log &&
>  
> +		touch test.r &&

		test-tool chmtime +10 test.r

would be more reliable.

>  		filter_git checkout --quiet --no-progress empty-branch &&
>  		cat >expected.log <<-EOF &&
>  			START
> 

-- Hannes

Thomas Gummerer Aug. 21, 2019, 2:52 p.m. UTC | #3

On 08/20, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > Fix the test by updating the mtime of test.r, so git is forced to
> > check the contents of the file, and the clean filter is run as the
> > test expects.
> 
> Hmph, depending on the timestamp granularity, with this patch,
> test.r would have mtime that is the same or a bit later than that of
> the index file.  Is it sufficient to really "force" Git to check the
> contents, or does it just make the likelyhood that it would choose
> to check a bit bigger (in other words, are we solving the race, or
> merely making the race window smaller)?

This test only worked until now because git checks the contents if the
mtime of the file and the index are the same.  This is because of
racy-git.  I tried to describe this in the commit message, but looks
like it wasn't clear enough.  Do you have any suggestions on how to
make it clearer?

It will also check the contents if the mtime is greater than the
timestamp of the index, so the 'touch' here would also cover that.

So the changes here do solve the race completely.

> Thanks.
> 
> >
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >  t/t0021-conversion.sh | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> > index e10f5f787f..66f75005d5 100755
> > --- a/t/t0021-conversion.sh
> > +++ b/t/t0021-conversion.sh
> > @@ -390,6 +390,7 @@ test_expect_success PERL 'required process filter should filter data' '
> >  		EOF
> >  		test_cmp_exclude_clean expected.log debug.log &&
> >  
> > +		touch test.r &&
> >  		filter_git checkout --quiet --no-progress empty-branch &&
> >  		cat >expected.log <<-EOF &&
> >  			START

Thomas Gummerer Aug. 21, 2019, 2:56 p.m. UTC | #4

On 08/20, Johannes Sixt wrote:
> Am 20.08.19 um 08:56 schrieb Thomas Gummerer:
> > Fix the test by updating the mtime of test.r, ...
> 
> > diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> > index e10f5f787f..66f75005d5 100755
> > --- a/t/t0021-conversion.sh
> > +++ b/t/t0021-conversion.sh
> > @@ -390,6 +390,7 @@ test_expect_success PERL 'required process filter should filter data' '
> >  		EOF
> >  		test_cmp_exclude_clean expected.log debug.log &&
> >  
> > +		touch test.r &&
> 
> 		test-tool chmtime +10 test.r
> 
> would be more reliable.

Hmm, is touch unreliable on some platforms?  I didn't think of
'test-tool chmtime', but I'm also not sure it's better than touch in
this case.

To me te 'touch' signifies that the timestamp must be updated after
the previous checkout, so git thinks it could possibly have been
changed, which I think is clearer in this case than setting the mtime
to a future time.

But I'm happy to change it if there's something I'm missing why
'test-tool chmtime' is better in this case.

> >  		filter_git checkout --quiet --no-progress empty-branch &&
> >  		cat >expected.log <<-EOF &&
> >  			START
> > 
> 
> -- Hannes

Junio C Hamano Aug. 21, 2019, 3:40 p.m. UTC | #5

Thomas Gummerer <t.gummerer@gmail.com> writes:

> It will also check the contents if the mtime is greater than the
> timestamp of the index, so the 'touch' here would also cover that.
>
> So the changes here do solve the race completely.

OK, the explanation makes sense.

Either test.r has been correctly checked out and has an older
timestamp or a more recent timestamp. In the former case, the index
knows that we did not touch it, so the next "checkout" knows it does
not have to ask the clean filter to work on it.  In the latter case,
the index is unsure if we touched it (or, suspects that it has
updated contents in it), so the clean filter needs to read from the
working tree to see if we did change it (and we find it is not
modified).  The outcome at the higher level, the answer to the
question "checkout" wanted to ask, is the same: test.r has no local
modificaiton and we can switch branches safely.

And that is already validated by seeing what exit status "checkout"
gives us, so it sort-of feels to be testing a bit too low level
implementation detail to see on which paths the filters are or are
not called, but that is not a problem with this fix.  If we want to
check at that level, we should do so correctly, and making sure that
the test.r file has recent timestamp to convince "checkout" that it
needs to verify contents is the right thing to do.

Thanks.

Johannes Sixt Aug. 21, 2019, 6:23 p.m. UTC | #6

Am 21.08.19 um 16:56 schrieb Thomas Gummerer:
> On 08/20, Johannes Sixt wrote:
>> Am 20.08.19 um 08:56 schrieb Thomas Gummerer:
>>> Fix the test by updating the mtime of test.r, ...
>>
>>> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
>>> index e10f5f787f..66f75005d5 100755
>>> --- a/t/t0021-conversion.sh
>>> +++ b/t/t0021-conversion.sh
>>> @@ -390,6 +390,7 @@ test_expect_success PERL 'required process filter should filter data' '
>>>  		EOF
>>>  		test_cmp_exclude_clean expected.log debug.log &&
>>>  
>>> +		touch test.r &&
>>
>> 		test-tool chmtime +10 test.r
>>
>> would be more reliable.
> 
> Hmm, is touch unreliable on some platforms?  I didn't think of
> 'test-tool chmtime', but I'm also not sure it's better than touch in
> this case.
> 
> To me te 'touch' signifies that the timestamp must be updated after
> the previous checkout, so git thinks it could possibly have been
> changed, which I think is clearer in this case than setting the mtime
> to a future time.

touch does not guarantee that the current time is different from the
timestamp that the file already carries, particularly not when the
filesystem stores just a resolution of 1 second, and commands are
executed quickly.

But when we use test-tool chmtime +10, then the timestamp is definitely
different. If you don't like a timestamp in the future, use -10, or
anything else that is different from zero.

-- Hannes

SZEDER Gábor Aug. 21, 2019, 10:03 p.m. UTC | #7

On Wed, Aug 21, 2019 at 08:23:23PM +0200, Johannes Sixt wrote:
> Am 21.08.19 um 16:56 schrieb Thomas Gummerer:
> > On 08/20, Johannes Sixt wrote:
> >> Am 20.08.19 um 08:56 schrieb Thomas Gummerer:
> >>> Fix the test by updating the mtime of test.r, ...
> >>
> >>> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> >>> index e10f5f787f..66f75005d5 100755
> >>> --- a/t/t0021-conversion.sh
> >>> +++ b/t/t0021-conversion.sh
> >>> @@ -390,6 +390,7 @@ test_expect_success PERL 'required process filter should filter data' '
> >>>  		EOF
> >>>  		test_cmp_exclude_clean expected.log debug.log &&
> >>>  
> >>> +		touch test.r &&
> >>
> >> 		test-tool chmtime +10 test.r
> >>
> >> would be more reliable.
> > 
> > Hmm, is touch unreliable on some platforms?  I didn't think of
> > 'test-tool chmtime', but I'm also not sure it's better than touch in
> > this case.
> > 
> > To me te 'touch' signifies that the timestamp must be updated after
> > the previous checkout, so git thinks it could possibly have been
> > changed, which I think is clearer in this case than setting the mtime
> > to a future time.
> 
> touch does not guarantee that the current time is different from the
> timestamp that the file already carries, particularly not when the
> filesystem stores just a resolution of 1 second, and commands are
> executed quickly.

This 'touch' must ensure that the timestamp of the file is not older
than the timestamp of the index, and to achive that it doesn't
necessarily have to modify the timestamp.

The file is modified first, then the index is updated, and finally
comes this 'touch'.  Consequently, if 'touch' doesn't modify the
timestamp of the file, then it must have the same timestamp as the
index, IOW it's racily clean, and the subsequent 'git checkout' has to
look at the file content and has to run the filter, and that's what we
want to see here.

However, I'm not sure what would happens if the system clock were to
jump back in between, but since it's only a test I don't think it's
worth caring about.

> But when we use test-tool chmtime +10, then the timestamp is definitely
> different.

'test-tool chmtime +10' adjusts the timestamp of the file relative to
its current timestamp.  So yeah, the file's timestamp definitely
changes, but that's not enough, because it doesn't ensure that the new
timestamp is not older than the timestamp of the index.  Just imagine
the arguably pathological situation that right after the file was last
modified the system miraculously comes to a complete stall, and only
manages to resume after 15 seconds to continue with updating the
index.  This means that the timestamp of the file will be 15s older
than the index, and after that 'chmtime +10' it will still be 5s
older.  Consequently, 'git checkout' will think that the file is
clean, it won't run the filter that we expect, and the test will fail.
So instead of '+10' it should be '=+10' to set the new timestamp
relative to the current time, but I'm not too keen about the timestamp
in the future either (though the file is about to be deleted anyway).

I think it would be best to explicitly set the timestamp of the file
and the index sort-of relative to each other and add an in-code
comment as well, e.g.:

  # Make sure that the file appears dirty, so checkout below has to
  # run the configured filter.
  test-tool chmtime =-10 .git/index &&
  test-tool chmtime =+0 test.r &&

Thomas Gummerer Aug. 22, 2019, 5:49 p.m. UTC | #8

On 08/22, SZEDER Gábor wrote:
> On Wed, Aug 21, 2019 at 08:23:23PM +0200, Johannes Sixt wrote:
> > Am 21.08.19 um 16:56 schrieb Thomas Gummerer:
> > > On 08/20, Johannes Sixt wrote:
> > >> Am 20.08.19 um 08:56 schrieb Thomas Gummerer:
> > >>> Fix the test by updating the mtime of test.r, ...
> > >>
> > >>> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> > >>> index e10f5f787f..66f75005d5 100755
> > >>> --- a/t/t0021-conversion.sh
> > >>> +++ b/t/t0021-conversion.sh
> > >>> @@ -390,6 +390,7 @@ test_expect_success PERL 'required process filter should filter data' '
> > >>>  		EOF
> > >>>  		test_cmp_exclude_clean expected.log debug.log &&
> > >>>  
> > >>> +		touch test.r &&
> > >>
> > >> 		test-tool chmtime +10 test.r
> > >>
> > >> would be more reliable.
> > > 
> > > Hmm, is touch unreliable on some platforms?  I didn't think of
> > > 'test-tool chmtime', but I'm also not sure it's better than touch in
> > > this case.
> > > 
> > > To me te 'touch' signifies that the timestamp must be updated after
> > > the previous checkout, so git thinks it could possibly have been
> > > changed, which I think is clearer in this case than setting the mtime
> > > to a future time.
> > 
> > touch does not guarantee that the current time is different from the
> > timestamp that the file already carries, particularly not when the
> > filesystem stores just a resolution of 1 second, and commands are
> > executed quickly.
> 
> This 'touch' must ensure that the timestamp of the file is not older
> than the timestamp of the index, and to achive that it doesn't
> necessarily have to modify the timestamp.
> 
> The file is modified first, then the index is updated, and finally
> comes this 'touch'.  Consequently, if 'touch' doesn't modify the
> timestamp of the file, then it must have the same timestamp as the
> index, IOW it's racily clean, and the subsequent 'git checkout' has to
> look at the file content and has to run the filter, and that's what we
> want to see here.

Right.

> However, I'm not sure what would happens if the system clock were to
> jump back in between, but since it's only a test I don't think it's
> worth caring about.

Yeah, I think much of git wouldn't work correctly if that happens, so
I think it's fairly safe to ignore in the test suite.

> > But when we use test-tool chmtime +10, then the timestamp is definitely
> > different.
> 
> 'test-tool chmtime +10' adjusts the timestamp of the file relative to
> its current timestamp.  So yeah, the file's timestamp definitely
> changes, but that's not enough, because it doesn't ensure that the new
> timestamp is not older than the timestamp of the index.  Just imagine
> the arguably pathological situation that right after the file was last
> modified the system miraculously comes to a complete stall, and only
> manages to resume after 15 seconds to continue with updating the
> index.  This means that the timestamp of the file will be 15s older
> than the index, and after that 'chmtime +10' it will still be 5s
> older.  Consequently, 'git checkout' will think that the file is
> clean, it won't run the filter that we expect, and the test will fail.
> So instead of '+10' it should be '=+10' to set the new timestamp
> relative to the current time, but I'm not too keen about the timestamp
> in the future either (though the file is about to be deleted anyway).

Right, the above is why I think 'touch' is a good idea here.  Short of
system clocks jumping around, which will most likely break more than
this test anyway it guarantees that the timestamp is equal or greater
than the timestamp of the index, which is what we need here.

> I think it would be best to explicitly set the timestamp of the file
> and the index sort-of relative to each other and add an in-code
> comment as well, e.g.:
> 
>   # Make sure that the file appears dirty, so checkout below has to
>   # run the configured filter.
>   test-tool chmtime =-10 .git/index &&
>   test-tool chmtime =+0 test.r &&

I think the comment is a good idea.  I personally still prefer just
using 'touch' though, as I find it slightly easier to read (I had to
go look up what the =-/=+ in 'test-tool chmtime' does, while I knew
what touch would be doing :)

That said that's a minor preference for me, if people have a strong
opinion that test-tool chmtime is really better here I'm fine with
changing it.

Junio C Hamano Aug. 22, 2019, 6:04 p.m. UTC | #9

Thomas Gummerer <t.gummerer@gmail.com> writes:

>>   # Make sure that the file appears dirty, so checkout below has to
>>   # run the configured filter.
>>   test-tool chmtime =-10 .git/index &&
>>   test-tool chmtime =+0 test.r &&
>
> I think the comment is a good idea.  I personally still prefer just
> using 'touch' though, as I find it slightly easier to read (I had to
> go look up what the =-/=+ in 'test-tool chmtime' does, while I knew
> what touch would be doing :)

Yup, I do not quite get why people feel 'touch' is a bad idea here.

I also think we should discourage "test-tool chmtime =<anything>"
(i.e. set to relative to the timestamp read from time(), as opposed
to relative to the timestamp read from the filesystem), when we can
avoid it, to allow tests on remote filesystem where the filesystem
clock and the system clock may not always be in sync.

> That said that's a minor preference for me, if people have a strong
> opinion that test-tool chmtime is really better here I'm fine with
> changing it.

Johannes Sixt Aug. 22, 2019, 6:52 p.m. UTC | #10

Am 22.08.19 um 19:49 schrieb Thomas Gummerer:
> Right, the above is why I think 'touch' is a good idea here.  Short of
> system clocks jumping around, which will most likely break more than
> this test anyway it guarantees that the timestamp is equal or greater
> than the timestamp of the index, which is what we need here.

Ok, thanks for the clarification. I didn't see the context. It looks
like touch is good enough.

-- Hannes

t0021: make sure clean filter runs

Commit Message

Comments

Patch