Message ID | pull.1802.git.1727577690390.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fsmonitor OSX: fix hangs for submodules | expand |
"Koji Nakamaru via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Koji Nakamaru <koji.nakamaru@gree.net> > > fsmonitor_classify_path_absolute() expects state->path_gitdir_watch.buf > has no trailing '/' or '.' For a submodule, fsmonitor_run_daemon() sets > the value with trailing "/." (as repo_get_git_dir(the_repository) on > Darwin returns ".") so that fsmonitor_classify_path_absolute() returns > IS_OUTSIDE_CONE. > > In this case, fsevent_callback() doesn't update cookie_list so that > fsmonitor_publish() does nothing and with_lock__mark_cookies_seen() is > not invoked. > > As with_lock__wait_for_cookie() infinitely waits for state->cookies_cond > that with_lock__mark_cookies_seen() should unlock, the whole daemon > hangs. The above very nicely describes the cause, the mechansim that leads to the end-user observable effect, and the (bad) effect the bug has. I wish everybody wrote their proposed commit messages like this ;-) > Remove trailing "/." from state->path_gitdir_watch.buf for submodules > and add a corresponding test in t7527-builtin-fsmonitor.sh. > > Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> > Signed-off-by: Koji Nakamaru <koji.nakamaru@gree.net> > --- > fsmonitor/darwin: fix hangs for submodules > diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh > index 730f3c7f810..7acd074a97f 100755 > --- a/t/t7527-builtin-fsmonitor.sh > +++ b/t/t7527-builtin-fsmonitor.sh > @@ -82,6 +82,28 @@ have_t2_data_event () { > grep -e '"event":"data".*"category":"'"$c"'".*"key":"'"$k"'"' > } > > +start_git_in_background () { > + git "$@" & > + git_pid=$! > + nr_tries_left=10 > + while true > + do > + if test $nr_tries_left -eq 0 > + then > + kill $git_pid > + exit 1 > + fi > + sleep 1 > + nr_tries_left=$(($nr_tries_left - 1)) > + done > /dev/null 2>&1 & So, the command is allowed to run for 10 seconds and then a signal is sent to the process (by the way, we do not write the SP between ">" and "/dev/null"). > + watchdog_pid=$! > + wait $git_pid And the process to ensure the command gets killed in 10 seconds is called the "watchdog". We let the command run for completion (and we'd be happy if it did without watchdog needing to forcibly kill it). Which means that even after the test finishes normally (e.g., the command completes without getting killed by the watchdog, because it is on a fast box and finishes in 0.5 second), we have leftover watchdog process hanging around for 10 seconds, which might interfere with the removal of the $TRASH_DIRECTORY at the end of the test. There is a helper function to kill both (below), which probably is used to avoid it. Let's keep reading. > +} > + > +stop_git_and_watchdog () { > + kill $git_pid $watchdog_pid > +} This sends a signal and let the process die. Without waiting to make sure they indeed died, at which point we can safely remove the $TRASH_DIRECTORY on filesystems that refuse to remove a directory when a process still has it as its current working directory. Shouldn't it loop, like for pid in $git_pid $watchdog_pid do until kill -0 $pid do kill $pid done done or something? Or is there a mechanism already to ensure that we return after they get killed that I am failing to find? > test_expect_success 'explicit daemon start and stop' ' > test_when_finished "stop_daemon_delete_repo test_explicit" && > > @@ -907,6 +929,23 @@ test_expect_success "submodule absorbgitdirs implicitly starts daemon" ' > test_subcommand git fsmonitor--daemon start <super-sub.trace > ' > > +test_expect_success "submodule implicitly starts daemon by pull" ' > + test_atexit "stop_git_and_watchdog" && Hmph, this is _atexit and not _when_finished because...? > + test_when_finished "rm -rf cloned; \ > + rm -rf super; \ > + rm -rf sub" && Makes me wonder why it is not written like so: test_when_finished "rm -rf cloned super sub" && which is short enough to still fit on a line. Is there something I am missing that these directories must be removed separately and in this order? > + create_super super && > + create_sub sub && > + > + git -C super submodule add ../sub ./dir_1/dir_2/sub && > + git -C super commit -m "add sub" && > + git clone --recurse-submodules super cloned && > + > + git -C cloned/dir_1/dir_2/sub config core.fsmonitor true && > + start_git_in_background -C cloned pull --recurse-submodules > +' Other than that, very nicely done. Thanks. > # On a case-insensitive file system, confirm that the daemon > # notices when the .git directory is moved/renamed/deleted > # regardless of how it is spelled in the FS event. > > base-commit: 3857aae53f3633b7de63ad640737c657387ae0c6
Thank you very much for carefully checking the patch and suggesting better ways. I'll later revise it and submit a new one. > > +} > > + > > +stop_git_and_watchdog () { > > + kill $git_pid $watchdog_pid > > +} > > This sends a signal and let the process die. Without waiting to > make sure they indeed died, at which point we can safely remove the > $TRASH_DIRECTORY on filesystems that refuse to remove a directory > when a process still has it as its current working directory. > > Shouldn't it loop, like > > for pid in $git_pid $watchdog_pid > do > until kill -0 $pid > do > kill $pid > done > done > > or something? Or is there a mechanism already to ensure that we > return after they get killed that I am failing to find? I agree that we have to wait for pids. I also realized that we should run git in another process group and kill the group for killing all git child processes. I'll fix the code. > > test_expect_success 'explicit daemon start and stop' ' > > test_when_finished "stop_daemon_delete_repo test_explicit" && > > > > @@ -907,6 +929,23 @@ test_expect_success "submodule absorbgitdirs implicitly starts daemon" ' > > test_subcommand git fsmonitor--daemon start <super-sub.trace > > ' > > > > +test_expect_success "submodule implicitly starts daemon by pull" ' > > + test_atexit "stop_git_and_watchdog" && > > Hmph, this is _atexit and not _when_finished because...? This is because README describes _atexit to run unconditionally to clean up before the test script exits, e.g. to stop (kill) a daemon. More appropriately, we should kill git before "rm -rf cloned super sub" in _when_finished and kill watchdog in _atexit. I'll adjust the code. > > + test_when_finished "rm -rf cloned; \ > > + rm -rf super; \ > > + rm -rf sub" && > > Makes me wonder why it is not written like so: > > test_when_finished "rm -rf cloned super sub" && > > which is short enough to still fit on a line. Is there something I > am missing that these directories must be removed separately and in > this order? There is no special reason, I simply followed the style used in t7527-builtin-fsmonitor.sh. I'll fix this part. Koji Nakamaru
diff --git a/builtin/fsmonitor--daemon.c b/builtin/fsmonitor--daemon.c index dce8a3b2482..e1e6b96d09e 100644 --- a/builtin/fsmonitor--daemon.c +++ b/builtin/fsmonitor--daemon.c @@ -1314,6 +1314,7 @@ static int fsmonitor_run_daemon(void) strbuf_reset(&state.path_gitdir_watch); strbuf_addstr(&state.path_gitdir_watch, absolute_path(repo_get_git_dir(the_repository))); + strbuf_strip_suffix(&state.path_gitdir_watch, "/."); state.nr_paths_watching = 2; } diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh index 730f3c7f810..7acd074a97f 100755 --- a/t/t7527-builtin-fsmonitor.sh +++ b/t/t7527-builtin-fsmonitor.sh @@ -82,6 +82,28 @@ have_t2_data_event () { grep -e '"event":"data".*"category":"'"$c"'".*"key":"'"$k"'"' } +start_git_in_background () { + git "$@" & + git_pid=$! + nr_tries_left=10 + while true + do + if test $nr_tries_left -eq 0 + then + kill $git_pid + exit 1 + fi + sleep 1 + nr_tries_left=$(($nr_tries_left - 1)) + done > /dev/null 2>&1 & + watchdog_pid=$! + wait $git_pid +} + +stop_git_and_watchdog () { + kill $git_pid $watchdog_pid +} + test_expect_success 'explicit daemon start and stop' ' test_when_finished "stop_daemon_delete_repo test_explicit" && @@ -907,6 +929,23 @@ test_expect_success "submodule absorbgitdirs implicitly starts daemon" ' test_subcommand git fsmonitor--daemon start <super-sub.trace ' +test_expect_success "submodule implicitly starts daemon by pull" ' + test_atexit "stop_git_and_watchdog" && + test_when_finished "rm -rf cloned; \ + rm -rf super; \ + rm -rf sub" && + + create_super super && + create_sub sub && + + git -C super submodule add ../sub ./dir_1/dir_2/sub && + git -C super commit -m "add sub" && + git clone --recurse-submodules super cloned && + + git -C cloned/dir_1/dir_2/sub config core.fsmonitor true && + start_git_in_background -C cloned pull --recurse-submodules +' + # On a case-insensitive file system, confirm that the daemon # notices when the .git directory is moved/renamed/deleted # regardless of how it is spelled in the FS event.