diff mbox series

[11/11] test-lib: clear watchman watches at test completion

Message ID 47cecb4a83a3f726088ffba0b00679384c7349ae.1574374826.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Improve testability with GIT_TEST_FSMONITOR | expand

Commit Message

Linus Arver via GitGitGadget Nov. 21, 2019, 10:20 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/test-lib-functions.sh | 15 +++++++++++++++
 t/test-lib.sh           |  2 ++
 2 files changed, 17 insertions(+)

Comments

SZEDER Gábor Nov. 22, 2019, 1:06 a.m. UTC | #1
On Thu, Nov 21, 2019 at 10:20:26PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/test-lib-functions.sh | 15 +++++++++++++++
>  t/test-lib.sh           |  2 ++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index e0b3f28d3a..03573caf42 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1475,3 +1475,18 @@ test_set_port () {
>  	port=$(($port + ${GIT_TEST_STRESS_JOB_NR:-0}))
>  	eval $var=$port
>  }
> +
> +test_clear_watchman () {
> +	if test $GIT_TEST_FSMONITOR -ne ""

In the rare cases when this function is invoked (see below) this
condition triggers an error from the shell running test script:

  - when the variable is not set, because of the lack of quotes around
    the variable name:

      $ ./t5570-git-daemon.sh 
      [....]
      ok 21 - hostname interpolation works after LF-stripping
      ./t5570-git-daemon.sh: 1482: test: -ne: unexpected operator
      # passed all 21 test(s)
      1..21

  - when the variable is set, because the '-ne' operator does integer
    comparison:

      $ GIT_TEST_FSMONITOR="$PWD"/t7519/fsmonitor-none ./t5570-git-daemon.sh
      [...]
      ok 21 - hostname interpolation works after LF-stripping
      ./t5570-git-daemon.sh: 1482: test: Illegal number: /home/szeder/src/git/t/t7519/fsmonitor-none
      # failed 1 among 21 test(s)
      1..21

Please use 'if test -n "$GIT_TEST_FSMONITOR"' instead.

> +	then
> +		watchman watch-list |

Then with the above fixed, trying to run 'watchman' triggers another
error if it's not installed:

  $ GIT_TEST_FSMONITOR="$PWD"/t7519/fsmonitor-none ./t5570-git-daemon.sh 
  [...]
  ok 21 - hostname interpolation works after LF-stripping
  ./t5570-git-daemon.sh: 1484: ./t5570-git-daemon.sh: watchman: not found
  # failed 1 among 21 test(s)

I think we need an additional condition to run this only if
't7519/fsmonitor-watchman' is used in the tests.

> +			grep "$TRASH_DIRECTORY" |
> +			sed "s/\t\"//g" |
> +			sed "s/\",//g" >repo-list
> +
> +		for repo in $(cat repo-list)
> +		do
> +			watchman watch-del "$repo"
> +		done
> +	fi
> +}
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 30b07e310f..067a432ea5 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1072,6 +1072,8 @@ test_atexit_handler () {
>  	# sure that the registered cleanup commands are run only once.
>  	test : != "$test_atexit_cleanup" || return 0
>  
> +	test_clear_watchman

I'm not sure where to put this call, but this is definitely not the
right place for it.  See that 'return 0' above in the context?  That's
where the test_atexit_handler function returns early when no atexit
handler commands are set, i.e. in all test scripts that don't involve
some kind of daemons, thus this call is not invoked in the majority of
test scripts.

Simply moving this call before that early return is not good, because
then it would be invoked twice.

An option would be to register this call as an atexit command
somewhere late in 'test-lib.sh' (around where GIT_TEST_GETTEXT_POISON
is restored, perhaps).  That way it would be invoked most of the time,
and it would be invoked only once, but I'm not sure how it would work
out with test scripts that unset GIT_TEST_FSMONITOR somewhere in the
middle for the remainder of the test script.  However, register the
atexit command only if GIT_TEST_FSMONITOR is set (to something
watchman-specific), so it won't be invoked at all if
GIT_TEST_FSMONITOR is not set, and thus it won't generate additional
test output and trace.

I don't have a better idea.

> +
>  	setup_malloc_check
>  	test_eval_ "$test_atexit_cleanup"
>  	test_atexit_cleanup=:
> -- 
> gitgitgadget
Derrick Stolee Dec. 9, 2019, 2:12 p.m. UTC | #2
On 11/21/2019 8:06 PM, SZEDER Gábor wrote:

Thanks for this message. Sorry I'm so late getting back to it.

> On Thu, Nov 21, 2019 at 10:20:26PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  t/test-lib-functions.sh | 15 +++++++++++++++
>>  t/test-lib.sh           |  2 ++
>>  2 files changed, 17 insertions(+)
>>
>> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
>> index e0b3f28d3a..03573caf42 100644
>> --- a/t/test-lib-functions.sh
>> +++ b/t/test-lib-functions.sh
>> @@ -1475,3 +1475,18 @@ test_set_port () {
>>  	port=$(($port + ${GIT_TEST_STRESS_JOB_NR:-0}))
>>  	eval $var=$port
>>  }
>> +
>> +test_clear_watchman () {
>> +	if test $GIT_TEST_FSMONITOR -ne ""
> 
> In the rare cases when this function is invoked (see below) this
> condition triggers an error from the shell running test script:
> 
>   - when the variable is not set, because of the lack of quotes around
>     the variable name:
> 
>       $ ./t5570-git-daemon.sh 
>       [....]
>       ok 21 - hostname interpolation works after LF-stripping
>       ./t5570-git-daemon.sh: 1482: test: -ne: unexpected operator
>       # passed all 21 test(s)
>       1..21
> 
>   - when the variable is set, because the '-ne' operator does integer
>     comparison:
> 
>       $ GIT_TEST_FSMONITOR="$PWD"/t7519/fsmonitor-none ./t5570-git-daemon.sh
>       [...]
>       ok 21 - hostname interpolation works after LF-stripping
>       ./t5570-git-daemon.sh: 1482: test: Illegal number: /home/szeder/src/git/t/t7519/fsmonitor-none
>       # failed 1 among 21 test(s)
>       1..21
> 
> Please use 'if test -n "$GIT_TEST_FSMONITOR"' instead.

Thanks for the pointers.

>> +	then
>> +		watchman watch-list |
> 
> Then with the above fixed, trying to run 'watchman' triggers another
> error if it's not installed:
> 
>   $ GIT_TEST_FSMONITOR="$PWD"/t7519/fsmonitor-none ./t5570-git-daemon.sh 
>   [...]
>   ok 21 - hostname interpolation works after LF-stripping
>   ./t5570-git-daemon.sh: 1484: ./t5570-git-daemon.sh: watchman: not found
>   # failed 1 among 21 test(s)
> 
> I think we need an additional condition to run this only if
> 't7519/fsmonitor-watchman' is used in the tests.

The intention is to enable a test-suite-wide run using GIT_TEST_FSMONITOR,
and that can only use watchman (currently). Barring wanting to unset the
variable if it was set on purpose in a test script, the other options do
not actually return correct values to make use of the feature.

>> +			grep "$TRASH_DIRECTORY" |
>> +			sed "s/\t\"//g" |
>> +			sed "s/\",//g" >repo-list
>> +
>> +		for repo in $(cat repo-list)
>> +		do
>> +			watchman watch-del "$repo"
>> +		done
>> +	fi
>> +}
>> diff --git a/t/test-lib.sh b/t/test-lib.sh
>> index 30b07e310f..067a432ea5 100644
>> --- a/t/test-lib.sh
>> +++ b/t/test-lib.sh
>> @@ -1072,6 +1072,8 @@ test_atexit_handler () {
>>  	# sure that the registered cleanup commands are run only once.
>>  	test : != "$test_atexit_cleanup" || return 0
>>  
>> +	test_clear_watchman
> 
> I'm not sure where to put this call, but this is definitely not the
> right place for it.  See that 'return 0' above in the context?  That's
> where the test_atexit_handler function returns early when no atexit
> handler commands are set, i.e. in all test scripts that don't involve
> some kind of daemons, thus this call is not invoked in the majority of
> test scripts.

Ah, I misunderstood the point of test_atexit_handler.

> Simply moving this call before that early return is not good, because
> then it would be invoked twice.
> 
> An option would be to register this call as an atexit command
> somewhere late in 'test-lib.sh' (around where GIT_TEST_GETTEXT_POISON
> is restored, perhaps).  That way it would be invoked most of the time,
> and it would be invoked only once, but I'm not sure how it would work
> out with test scripts that unset GIT_TEST_FSMONITOR somewhere in the
> middle for the remainder of the test script.  However, register the
> atexit command only if GIT_TEST_FSMONITOR is set (to something
> watchman-specific), so it won't be invoked at all if
> GIT_TEST_FSMONITOR is not set, and thus it won't generate additional
> test output and trace.
> 
> I don't have a better idea.

Shouldn't it be sufficient to add it into test_done? If the test fails,
then we could leave watches open, but that's no worse than we had without
this test_clear_watchman method.

Thanks,
-Stolee
SZEDER Gábor Dec. 9, 2019, 11:40 p.m. UTC | #3
On Mon, Dec 09, 2019 at 09:12:37AM -0500, Derrick Stolee wrote:
> >> +		watchman watch-list |
> > 
> > Then with the above fixed, trying to run 'watchman' triggers another
> > error if it's not installed:
> > 
> >   $ GIT_TEST_FSMONITOR="$PWD"/t7519/fsmonitor-none ./t5570-git-daemon.sh 
> >   [...]
> >   ok 21 - hostname interpolation works after LF-stripping
> >   ./t5570-git-daemon.sh: 1484: ./t5570-git-daemon.sh: watchman: not found
> >   # failed 1 among 21 test(s)
> > 
> > I think we need an additional condition to run this only if
> > 't7519/fsmonitor-watchman' is used in the tests.
> 
> The intention is to enable a test-suite-wide run using GIT_TEST_FSMONITOR,
> and that can only use watchman (currently).

I've just run 'GIT_TEST_FSMONITOR=$(pwd)/t7519/fsmonitor-all make',
and it only failed one test in 't0090-cache-tree.sh', but the fix is
already in 'pu' in 61eea521fe (fsmonitor: do not compare bitmap size
with size of split index, 2019-11-13).


> >> diff --git a/t/test-lib.sh b/t/test-lib.sh
> >> index 30b07e310f..067a432ea5 100644
> >> --- a/t/test-lib.sh
> >> +++ b/t/test-lib.sh
> >> @@ -1072,6 +1072,8 @@ test_atexit_handler () {
> >>  	# sure that the registered cleanup commands are run only once.
> >>  	test : != "$test_atexit_cleanup" || return 0
> >>  
> >> +	test_clear_watchman
> > 
> > I'm not sure where to put this call, but this is definitely not the
> > right place for it.  See that 'return 0' above in the context?  That's
> > where the test_atexit_handler function returns early when no atexit
> > handler commands are set, i.e. in all test scripts that don't involve
> > some kind of daemons, thus this call is not invoked in the majority of
> > test scripts.
> 
> Ah, I misunderstood the point of test_atexit_handler.
> 
> > Simply moving this call before that early return is not good, because
> > then it would be invoked twice.
> > 
> > An option would be to register this call as an atexit command
> > somewhere late in 'test-lib.sh' (around where GIT_TEST_GETTEXT_POISON
> > is restored, perhaps).  That way it would be invoked most of the time,
> > and it would be invoked only once, but I'm not sure how it would work
> > out with test scripts that unset GIT_TEST_FSMONITOR somewhere in the
> > middle for the remainder of the test script.  However, register the
> > atexit command only if GIT_TEST_FSMONITOR is set (to something
> > watchman-specific), so it won't be invoked at all if
> > GIT_TEST_FSMONITOR is not set, and thus it won't generate additional
> > test output and trace.
> > 
> > I don't have a better idea.
> 
> Shouldn't it be sufficient to add it into test_done? If the test fails,
> then we could leave watches open, but that's no worse than we had without
> this test_clear_watchman method.

I don't know enough about watchman to have an informed opinion.

I think the answer mainly depends on what we want to achive and what
happens when a test script run with GIT_TEST_FSMONITOR exits without
invoking 'test_done' is re-executed (e.g. after a test case fails with
'--immediate' or when the user hits ctrl-c or closes the terminal
window mid-test).

As far as I understand the commit message of v2 of this patch [1], we
mainly want two things:

  - Avoid overloading watchman's watch queue.  For this it might
    indeed be sufficient to clear watches in 'test_done', because most
    test scripts tend to succeed most of the time.

  - Make GIT_TEST_FSMONITOR work reliably on Windows.  For this, I'm
    afraid it's not enough in general, because a failure with
    '--immediate' or after a ctrl-c we won't run 'test_done', so we
    won't clear the watches, and watchman will keep the fd to the
    trash dir open, and, consequently, will interfere with subsequent
    executions of the same test script as it can't delete the still
    existing trash dir left over from the previous run.
    
    It could still be sufficient for fsmonitor-enabled CI builds,
    though, because there we don't re-run tests, don't hit ctrl-c, and
    (at least on Azure Pipelines) don't use '--immediate', and the
    whole VM/container/whatever is thrown away at end anyway.

    On Linux/Unix-y systems it probably doesn't matter much, because
    they can delete open directories, but I wonder what happens with a
    watch when the directory it is supposed observe gets deleted.  If
    the watch is removed in this case, great; if it isn't, then...
    well, then what happens with it?  Will it be overwritten with the
    next test run, or will there be duplicate watches for the same
    dir?

[1] https://public-inbox.org/git/e51165f260d564ccb7a9b8e696691eccb184c01a.1575907804.git.gitgitgadget@gmail.com/
Derrick Stolee Dec. 10, 2019, 1:43 a.m. UTC | #4
On 12/9/2019 6:40 PM, SZEDER Gábor wrote:
> On Mon, Dec 09, 2019 at 09:12:37AM -0500, Derrick Stolee wrote:
>>>> +		watchman watch-list |
>>>
>>> Then with the above fixed, trying to run 'watchman' triggers another
>>> error if it's not installed:
>>>
>>>   $ GIT_TEST_FSMONITOR="$PWD"/t7519/fsmonitor-none ./t5570-git-daemon.sh 
>>>   [...]
>>>   ok 21 - hostname interpolation works after LF-stripping
>>>   ./t5570-git-daemon.sh: 1484: ./t5570-git-daemon.sh: watchman: not found
>>>   # failed 1 among 21 test(s)
>>>
>>> I think we need an additional condition to run this only if
>>> 't7519/fsmonitor-watchman' is used in the tests.
>>
>> The intention is to enable a test-suite-wide run using GIT_TEST_FSMONITOR,
>> and that can only use watchman (currently).
> 
> I've just run 'GIT_TEST_FSMONITOR=$(pwd)/t7519/fsmonitor-all make',
> and it only failed one test in 't0090-cache-tree.sh', but the fix is
> already in 'pu' in 61eea521fe (fsmonitor: do not compare bitmap size
> with size of split index, 2019-11-13).
> 
> 
>>>> diff --git a/t/test-lib.sh b/t/test-lib.sh
>>>> index 30b07e310f..067a432ea5 100644
>>>> --- a/t/test-lib.sh
>>>> +++ b/t/test-lib.sh
>>>> @@ -1072,6 +1072,8 @@ test_atexit_handler () {
>>>>  	# sure that the registered cleanup commands are run only once.
>>>>  	test : != "$test_atexit_cleanup" || return 0
>>>>  
>>>> +	test_clear_watchman
>>>
>>> I'm not sure where to put this call, but this is definitely not the
>>> right place for it.  See that 'return 0' above in the context?  That's
>>> where the test_atexit_handler function returns early when no atexit
>>> handler commands are set, i.e. in all test scripts that don't involve
>>> some kind of daemons, thus this call is not invoked in the majority of
>>> test scripts.
>>
>> Ah, I misunderstood the point of test_atexit_handler.
>>
>>> Simply moving this call before that early return is not good, because
>>> then it would be invoked twice.
>>>
>>> An option would be to register this call as an atexit command
>>> somewhere late in 'test-lib.sh' (around where GIT_TEST_GETTEXT_POISON
>>> is restored, perhaps).  That way it would be invoked most of the time,
>>> and it would be invoked only once, but I'm not sure how it would work
>>> out with test scripts that unset GIT_TEST_FSMONITOR somewhere in the
>>> middle for the remainder of the test script.  However, register the
>>> atexit command only if GIT_TEST_FSMONITOR is set (to something
>>> watchman-specific), so it won't be invoked at all if
>>> GIT_TEST_FSMONITOR is not set, and thus it won't generate additional
>>> test output and trace.
>>>
>>> I don't have a better idea.
>>
>> Shouldn't it be sufficient to add it into test_done? If the test fails,
>> then we could leave watches open, but that's no worse than we had without
>> this test_clear_watchman method.
> 
> I don't know enough about watchman to have an informed opinion.
> 
> I think the answer mainly depends on what we want to achive and what
> happens when a test script run with GIT_TEST_FSMONITOR exits without
> invoking 'test_done' is re-executed (e.g. after a test case fails with
> '--immediate' or when the user hits ctrl-c or closes the terminal
> window mid-test).
> 
> As far as I understand the commit message of v2 of this patch [1], we
> mainly want two things:
> 
>   - Avoid overloading watchman's watch queue.  For this it might
>     indeed be sufficient to clear watches in 'test_done', because most
>     test scripts tend to succeed most of the time.
> 
>   - Make GIT_TEST_FSMONITOR work reliably on Windows.  For this, I'm
>     afraid it's not enough in general, because a failure with
>     '--immediate' or after a ctrl-c we won't run 'test_done', so we
>     won't clear the watches, and watchman will keep the fd to the
>     trash dir open, and, consequently, will interfere with subsequent
>     executions of the same test script as it can't delete the still
>     existing trash dir left over from the previous run.

You are right. Running an individual test and ending it early would
lead to these leaked handles. This assumes someone is aware of the
GIT_TEST_FSMONITOR environment variable, so they are at least
interacting with the feature directly to some extent.

>     It could still be sufficient for fsmonitor-enabled CI builds,
>     though, because there we don't re-run tests, don't hit ctrl-c, and
>     (at least on Azure Pipelines) don't use '--immediate', and the
>     whole VM/container/whatever is thrown away at end anyway.

This is the hope. It would be nice to get to that point.

> 
>     On Linux/Unix-y systems it probably doesn't matter much, because
>     they can delete open directories, but I wonder what happens with a
>     watch when the directory it is supposed observe gets deleted.  If
>     the watch is removed in this case, great; if it isn't, then...
>     well, then what happens with it?  Will it be overwritten with the
>     next test run, or will there be duplicate watches for the same
>     dir?

When a directory is deleted from under Watchman on Linux, the watch
is removed...eventually. I'm not sure at exactly what point that happens.
At the very least, Watchman will receive and process the signals for all
of the paths being removed inside the directory. Running 'watch-del'
removes that overhead.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index e0b3f28d3a..03573caf42 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1475,3 +1475,18 @@  test_set_port () {
 	port=$(($port + ${GIT_TEST_STRESS_JOB_NR:-0}))
 	eval $var=$port
 }
+
+test_clear_watchman () {
+	if test $GIT_TEST_FSMONITOR -ne ""
+	then
+		watchman watch-list |
+			grep "$TRASH_DIRECTORY" |
+			sed "s/\t\"//g" |
+			sed "s/\",//g" >repo-list
+
+		for repo in $(cat repo-list)
+		do
+			watchman watch-del "$repo"
+		done
+	fi
+}
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 30b07e310f..067a432ea5 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1072,6 +1072,8 @@  test_atexit_handler () {
 	# sure that the registered cleanup commands are run only once.
 	test : != "$test_atexit_cleanup" || return 0
 
+	test_clear_watchman
+
 	setup_malloc_check
 	test_eval_ "$test_atexit_cleanup"
 	test_atexit_cleanup=: