diff mbox series

fsmonitor OSX: fix hangs for submodules

Message ID pull.1802.git.1727577690390.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series fsmonitor OSX: fix hangs for submodules | expand

Commit Message

Koji Nakamaru Sept. 29, 2024, 2:41 a.m. UTC
From: Koji Nakamaru <koji.nakamaru@gree.net>

fsmonitor_classify_path_absolute() expects state->path_gitdir_watch.buf
has no trailing '/' or '.' For a submodule, fsmonitor_run_daemon() sets
the value with trailing "/." (as repo_get_git_dir(the_repository) on
Darwin returns ".") so that fsmonitor_classify_path_absolute() returns
IS_OUTSIDE_CONE.

In this case, fsevent_callback() doesn't update cookie_list so that
fsmonitor_publish() does nothing and with_lock__mark_cookies_seen() is
not invoked.

As with_lock__wait_for_cookie() infinitely waits for state->cookies_cond
that with_lock__mark_cookies_seen() should unlock, the whole daemon
hangs.

Remove trailing "/." from state->path_gitdir_watch.buf for submodules
and add a corresponding test in t7527-builtin-fsmonitor.sh.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Koji Nakamaru <koji.nakamaru@gree.net>
---
    fsmonitor/darwin: fix hangs for submodules

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1802%2FKojiNakamaru%2Ffix%2Ffsmonitor-darwin-hangs-for-submodules-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1802/KojiNakamaru/fix/fsmonitor-darwin-hangs-for-submodules-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1802

 builtin/fsmonitor--daemon.c  |  1 +
 t/t7527-builtin-fsmonitor.sh | 39 ++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)


base-commit: 3857aae53f3633b7de63ad640737c657387ae0c6

Comments

Junio C Hamano Sept. 30, 2024, 5:40 p.m. UTC | #1
"Koji Nakamaru via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Koji Nakamaru <koji.nakamaru@gree.net>
>
> fsmonitor_classify_path_absolute() expects state->path_gitdir_watch.buf
> has no trailing '/' or '.' For a submodule, fsmonitor_run_daemon() sets
> the value with trailing "/." (as repo_get_git_dir(the_repository) on
> Darwin returns ".") so that fsmonitor_classify_path_absolute() returns
> IS_OUTSIDE_CONE.
>
> In this case, fsevent_callback() doesn't update cookie_list so that
> fsmonitor_publish() does nothing and with_lock__mark_cookies_seen() is
> not invoked.
>
> As with_lock__wait_for_cookie() infinitely waits for state->cookies_cond
> that with_lock__mark_cookies_seen() should unlock, the whole daemon
> hangs.

The above very nicely describes the cause, the mechansim that leads
to the end-user observable effect, and the (bad) effect the bug has.

I wish everybody wrote their proposed commit messages like this ;-)

> Remove trailing "/." from state->path_gitdir_watch.buf for submodules
> and add a corresponding test in t7527-builtin-fsmonitor.sh.
>
> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> Signed-off-by: Koji Nakamaru <koji.nakamaru@gree.net>
> ---
>     fsmonitor/darwin: fix hangs for submodules

> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index 730f3c7f810..7acd074a97f 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -82,6 +82,28 @@ have_t2_data_event () {
>  	grep -e '"event":"data".*"category":"'"$c"'".*"key":"'"$k"'"'
>  }
>  
> +start_git_in_background () {
> +	git "$@" &
> +	git_pid=$!
> +	nr_tries_left=10
> +	while true
> +	do
> +		if test $nr_tries_left -eq 0
> +		then
> +			kill $git_pid
> +			exit 1
> +		fi
> +		sleep 1
> +		nr_tries_left=$(($nr_tries_left - 1))
> +	done > /dev/null 2>&1 &

So, the command is allowed to run for 10 seconds and then a signal
is sent to the process (by the way, we do not write the SP between
">" and "/dev/null").

> +	watchdog_pid=$!
> +	wait $git_pid

And the process to ensure the command gets killed in 10 seconds is
called the "watchdog".  We let the command run for completion (and
we'd be happy if it did without watchdog needing to forcibly kill
it).

Which means that even after the test finishes normally (e.g., the
command completes without getting killed by the watchdog, because it
is on a fast box and finishes in 0.5 second), we have leftover
watchdog process hanging around for 10 seconds, which might interfere
with the removal of the $TRASH_DIRECTORY at the end of the test.

There is a helper function to kill both (below), which probably is
used to avoid it.  Let's keep reading.

> +}
> +
> +stop_git_and_watchdog () {
> +	kill $git_pid $watchdog_pid
> +}

This sends a signal and let the process die.  Without waiting to
make sure they indeed died, at which point we can safely remove the
$TRASH_DIRECTORY on filesystems that refuse to remove a directory
when a process still has it as its current working directory.

Shouldn't it loop, like

	for pid in $git_pid $watchdog_pid
	do
                until kill -0 $pid
                do
                        kill $pid
                done
	done

or something?  Or is there a mechanism already to ensure that we
return after they get killed that I am failing to find?

>  test_expect_success 'explicit daemon start and stop' '
>  	test_when_finished "stop_daemon_delete_repo test_explicit" &&
>  
> @@ -907,6 +929,23 @@ test_expect_success "submodule absorbgitdirs implicitly starts daemon" '
>  	test_subcommand git fsmonitor--daemon start <super-sub.trace
>  '
>  
> +test_expect_success "submodule implicitly starts daemon by pull" '
> +	test_atexit "stop_git_and_watchdog" &&

Hmph, this is _atexit and not _when_finished because...?

> +	test_when_finished "rm -rf cloned; \
> +			    rm -rf super; \
> +			    rm -rf sub" &&

Makes me wonder why it is not written like so:

	test_when_finished "rm -rf cloned super sub" &&

which is short enough to still fit on a line.  Is there something I
am missing that these directories must be removed separately and in
this order?

> +	create_super super &&
> +	create_sub sub &&
> +
> +	git -C super submodule add ../sub ./dir_1/dir_2/sub &&
> +	git -C super commit -m "add sub" &&
> +	git clone --recurse-submodules super cloned &&
> +
> +	git -C cloned/dir_1/dir_2/sub config core.fsmonitor true &&
> +	start_git_in_background -C cloned pull --recurse-submodules
> +'

Other than that, very nicely done.

Thanks.

>  # On a case-insensitive file system, confirm that the daemon
>  # notices when the .git directory is moved/renamed/deleted
>  # regardless of how it is spelled in the FS event.
>
> base-commit: 3857aae53f3633b7de63ad640737c657387ae0c6
Koji Nakamaru Oct. 1, 2024, 4:11 a.m. UTC | #2
Thank you very much for carefully checking the patch and suggesting
better ways. I'll later revise it and submit a new one.

> > +}
> > +
> > +stop_git_and_watchdog () {
> > +     kill $git_pid $watchdog_pid
> > +}
>
> This sends a signal and let the process die.  Without waiting to
> make sure they indeed died, at which point we can safely remove the
> $TRASH_DIRECTORY on filesystems that refuse to remove a directory
> when a process still has it as its current working directory.
>
> Shouldn't it loop, like
>
>         for pid in $git_pid $watchdog_pid
>         do
>                 until kill -0 $pid
>                 do
>                         kill $pid
>                 done
>         done
>
> or something?  Or is there a mechanism already to ensure that we
> return after they get killed that I am failing to find?

I agree that we have to wait for pids. I also realized that we should
run git in another process group and kill the group for killing all git
child processes. I'll fix the code.

> >  test_expect_success 'explicit daemon start and stop' '
> >       test_when_finished "stop_daemon_delete_repo test_explicit" &&
> >
> > @@ -907,6 +929,23 @@ test_expect_success "submodule absorbgitdirs implicitly starts daemon" '
> >       test_subcommand git fsmonitor--daemon start <super-sub.trace
> >  '
> >
> > +test_expect_success "submodule implicitly starts daemon by pull" '
> > +     test_atexit "stop_git_and_watchdog" &&
>
> Hmph, this is _atexit and not _when_finished because...?

This is because README describes _atexit to run unconditionally to clean
up before the test script exits, e.g. to stop (kill) a daemon. More
appropriately, we should kill git before "rm -rf cloned super sub" in
_when_finished and kill watchdog in _atexit. I'll adjust the code.

> > +     test_when_finished "rm -rf cloned; \
> > +                         rm -rf super; \
> > +                         rm -rf sub" &&
>
> Makes me wonder why it is not written like so:
>
>         test_when_finished "rm -rf cloned super sub" &&
>
> which is short enough to still fit on a line.  Is there something I
> am missing that these directories must be removed separately and in
> this order?

There is no special reason, I simply followed the style used in
t7527-builtin-fsmonitor.sh. I'll fix this part.


Koji Nakamaru
diff mbox series

Patch

diff --git a/builtin/fsmonitor--daemon.c b/builtin/fsmonitor--daemon.c
index dce8a3b2482..e1e6b96d09e 100644
--- a/builtin/fsmonitor--daemon.c
+++ b/builtin/fsmonitor--daemon.c
@@ -1314,6 +1314,7 @@  static int fsmonitor_run_daemon(void)
 		strbuf_reset(&state.path_gitdir_watch);
 		strbuf_addstr(&state.path_gitdir_watch,
 			      absolute_path(repo_get_git_dir(the_repository)));
+		strbuf_strip_suffix(&state.path_gitdir_watch, "/.");
 		state.nr_paths_watching = 2;
 	}
 
diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index 730f3c7f810..7acd074a97f 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -82,6 +82,28 @@  have_t2_data_event () {
 	grep -e '"event":"data".*"category":"'"$c"'".*"key":"'"$k"'"'
 }
 
+start_git_in_background () {
+	git "$@" &
+	git_pid=$!
+	nr_tries_left=10
+	while true
+	do
+		if test $nr_tries_left -eq 0
+		then
+			kill $git_pid
+			exit 1
+		fi
+		sleep 1
+		nr_tries_left=$(($nr_tries_left - 1))
+	done > /dev/null 2>&1 &
+	watchdog_pid=$!
+	wait $git_pid
+}
+
+stop_git_and_watchdog () {
+	kill $git_pid $watchdog_pid
+}
+
 test_expect_success 'explicit daemon start and stop' '
 	test_when_finished "stop_daemon_delete_repo test_explicit" &&
 
@@ -907,6 +929,23 @@  test_expect_success "submodule absorbgitdirs implicitly starts daemon" '
 	test_subcommand git fsmonitor--daemon start <super-sub.trace
 '
 
+test_expect_success "submodule implicitly starts daemon by pull" '
+	test_atexit "stop_git_and_watchdog" &&
+	test_when_finished "rm -rf cloned; \
+			    rm -rf super; \
+			    rm -rf sub" &&
+
+	create_super super &&
+	create_sub sub &&
+
+	git -C super submodule add ../sub ./dir_1/dir_2/sub &&
+	git -C super commit -m "add sub" &&
+	git clone --recurse-submodules super cloned &&
+
+	git -C cloned/dir_1/dir_2/sub config core.fsmonitor true &&
+	start_git_in_background -C cloned pull --recurse-submodules
+'
+
 # On a case-insensitive file system, confirm that the daemon
 # notices when the .git directory is moved/renamed/deleted
 # regardless of how it is spelled in the FS event.