Message ID | xmqq4jmzc91e.fsf_-_@gitster.g (mailing list archive) |
---|---|
State | Accepted |
Commit | 2b7b788fb31a74bcbff4e4c6efc6f3db6c3a49b7 |
Headers | show |
Series | ll-merge: killing the external merge driver aborts the merge | expand |
On Thu, Jun 22, 2023 at 5:33 PM Junio C Hamano <gitster@pobox.com> wrote: > > When an external merge driver dies with a signal, we should not > expect that the result left on the filesystem is in any useful > state. However, because the current code uses the return value from > run_command() and declares any positive value as a sign that the > driver successfully left conflicts in the result, and because the > return value from run_command() for a subprocess that died upon a > signal is positive, we end up treating whatever garbage left on the > filesystem as the result the merge driver wanted to leave us. Yeah, I think the tradition was exit code == number of conflicts for some merge processes. Not particularly useful when the driver died from some signal. > run_command() returns larger than 128 (WTERMSIG(status) + 128, to be > exact) when it notices that the subprocess died with a signal, so > detect such a case and return LL_MERGE_ERROR from ll_ext_merge(). Makes sense. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- > > * This time with an updated title, a minimal documentation, and an > additional test. > > Documentation/gitattributes.txt | 5 ++++- > ll-merge.c | 9 ++++++++- > t/t6406-merge-attr.sh | 23 +++++++++++++++++++++++ > 3 files changed, 35 insertions(+), 2 deletions(-) > > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > index 02a3ec83e4..6deb89a296 100644 > --- a/Documentation/gitattributes.txt > +++ b/Documentation/gitattributes.txt > @@ -1132,7 +1132,10 @@ size (see below). > The merge driver is expected to leave the result of the merge in > the file named with `%A` by overwriting it, and exit with zero > status if it managed to merge them cleanly, or non-zero if there > -were conflicts. > +were conflicts. When the driver crashes (e.g. killed by SEGV), > +it is expected to exit with non-zero status that are higher than > +128, and in such a case, the merge results in a failure (which is > +different from producing a conflict). Looks good. > The `merge.*.recursive` variable specifies what other merge > driver to use when the merge driver is called for an internal > diff --git a/ll-merge.c b/ll-merge.c > index 07ec16e8e5..ba45aa2f79 100644 > --- a/ll-merge.c > +++ b/ll-merge.c > @@ -243,7 +243,14 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn, > unlink_or_warn(temp[i]); > strbuf_release(&cmd); > strbuf_release(&path_sq); > - ret = (status > 0) ? LL_MERGE_CONFLICT : status; > + > + if (!status) > + ret = LL_MERGE_OK; > + else if (status <= 128) > + ret = LL_MERGE_CONFLICT; > + else > + /* died due to a signal: WTERMSIG(status) + 128 */ > + ret = LL_MERGE_ERROR; > return ret; > } Likewise. > diff --git a/t/t6406-merge-attr.sh b/t/t6406-merge-attr.sh > index 5e4e4dd6d9..b50aedbc00 100755 > --- a/t/t6406-merge-attr.sh > +++ b/t/t6406-merge-attr.sh > @@ -56,6 +56,12 @@ test_expect_success setup ' > ) >"$ours+" > cat "$ours+" >"$ours" > rm -f "$ours+" > + > + if test -f ./please-abort > + then > + echo >>./please-abort killing myself > + kill -9 $$ > + fi > exit "$exit" > EOF > chmod +x ./custom-merge > @@ -162,6 +168,23 @@ test_expect_success 'custom merge backend' ' > rm -f $o $a $b > ' > > +test_expect_success 'custom merge driver that is killed with a signal' ' > + test_when_finished "rm -f output please-abort" && > + > + git reset --hard anchor && > + git config --replace-all \ > + merge.custom.driver "./custom-merge %O %A %B 0 %P" && > + git config --replace-all \ > + merge.custom.name "custom merge driver for testing" && > + > + >./please-abort && > + echo "* merge=custom" >.gitattributes && > + test_must_fail git merge main && > + git ls-files -u >output && > + git diff --name-only HEAD >>output && > + test_must_be_empty output > +' > + I was about to comment that we needed to clean up the please-abort file, then realized I just missed it in my first reading of the test_when_finished line. So, patch looks good. Reviewed-by: Elijah Newren <newren@gmail.com>
Elijah Newren <newren@gmail.com> writes:
> Reviewed-by: Elijah Newren <newren@gmail.com>
Thanks for a quick review.
Junio C Hamano <gitster@pobox.com> writes: > Elijah Newren <newren@gmail.com> writes: > >> Reviewed-by: Elijah Newren <newren@gmail.com> > > > Thanks for a quick review. Unfortunately Windows does not seem to correctly detect the aborting merge driver. Does run_command() there report process death due to signals differently, I wonder? https://github.com/git/git/actions/runs/5360400800/jobs/9725341775#step:6:285 shows that on Windows, aborted external merge driver is not noticed and we happily take the auto-merged result, ouch. I am tempted to protect this step of the test with a prerequisite to skip it on Windows for now. Anybody with better idea? Thanks.
On 6/23/2023 4:31 PM, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> Elijah Newren <newren@gmail.com> writes: >> >>> Reviewed-by: Elijah Newren <newren@gmail.com> >> >> Thanks for a quick review. > Unfortunately Windows does not seem to correctly detect the aborting > merge driver. Does run_command() there report process death due to > signals differently, I wonder? > > https://github.com/git/git/actions/runs/5360400800/jobs/9725341775#step:6:285 > > shows that on Windows, aborted external merge driver is not noticed > and we happily take the auto-merged result, ouch. > > I am tempted to protect this step of the test with a prerequisite to > skip it on Windows for now. Anybody with better idea? > > Thanks. I would suggest putting in the correct test harness on Windows. abort() doesn't work very well. (Sample code only--read description below) #ifdef WIN32 TerminateProcess(GetCurrentProcess(), 131); /* something that looks like it passes the test */ TerminateProcess(GetCurrentProcess(), 0x80070485); /* actual exit code for process that cannot start because its missing a shared library */ #else abort(); #endif I only want to fight so hard with the Unix to Windows translation layer you use. Strictly speaking, the second TerminateProcess() line is what should be in the test harness, but if it doesn't work go with the first one. Then I at least have something to work with. I'm not going to lie to you. We are doing development on Windows, and the merge driver is written using a different portability layer. I am prepared to build a native shim to make it work if I have to. I am not in a position where I can build git and test any development code; getting the fix to us will be a long journey.
Hi, On Fri, 23 Jun 2023, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > Elijah Newren <newren@gmail.com> writes: > > > >> Reviewed-by: Elijah Newren <newren@gmail.com> > > > > > > Thanks for a quick review. > > Unfortunately Windows does not seem to correctly detect the aborting > merge driver. Does run_command() there report process death due to > signals differently, I wonder? > > https://github.com/git/git/actions/runs/5360400800/jobs/9725341775#step:6:285 > > shows that on Windows, aborted external merge driver is not noticed > and we happily take the auto-merged result, ouch. Hmm. I tried to verify this, but failed. With this patch: ```diff diff --git a/git.c b/git.c index 2f42da20f4e0..3c513e3f2cb1 100644 --- a/git.c +++ b/git.c @@ -330,6 +330,8 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) setenv(GIT_ATTR_SOURCE_ENVIRONMENT, cmd, 1); if (envchanged) *envchanged = 1; + } else if (!strcmp(cmd, "--abort")) { + abort(); } else { fprintf(stderr, _("unknown option: %s\n"), cmd); usage(git_usage_string); ``` I get this: ```console $ ./git.exe --abort $ echo $? 3 ``` For that reason, I am somehow doubtful that the `abort()` is actually called?!? Ciao, Johannes
On 6/27/2023 5:02 AM, Johannes Schindelin wrote: > Hi, > > > On Fri, 23 Jun 2023, Junio C Hamano wrote: > >> Junio C Hamano <gitster@pobox.com> writes: >> >>> Elijah Newren <newren@gmail.com> writes: >>> >>>> Reviewed-by: Elijah Newren <newren@gmail.com> >>> >>> Thanks for a quick review. >> Unfortunately Windows does not seem to correctly detect the aborting >> merge driver. Does run_command() there report process death due to >> signals differently, I wonder? >> >> https://github.com/git/git/actions/runs/5360400800/jobs/9725341775#step:6:285 >> >> shows that on Windows, aborted external merge driver is not noticed >> and we happily take the auto-merged result, ouch. > Hmm. I tried to verify this, but failed. With this patch: > > ```diff > diff --git a/git.c b/git.c > index 2f42da20f4e0..3c513e3f2cb1 100644 > --- a/git.c > +++ b/git.c > @@ -330,6 +330,8 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) > setenv(GIT_ATTR_SOURCE_ENVIRONMENT, cmd, 1); > if (envchanged) > *envchanged = 1; > + } else if (!strcmp(cmd, "--abort")) { > + abort(); > } else { > fprintf(stderr, _("unknown option: %s\n"), cmd); > usage(git_usage_string); > ``` > > I get this: > > > ```console > $ ./git.exe --abort > > $ echo $? > 3 > ``` > > For that reason, I am somehow doubtful that the `abort()` is actually > called?!? > > Ciao, > Johannes abort(); does _exit(3); on Windows because there are no signals. This is easily changed by providing abort like so: void abort() { _exit(131 /* or whatever else you think goes here */); }
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > On Fri, 23 Jun 2023, Junio C Hamano wrote: > >> Junio C Hamano <gitster@pobox.com> writes: >> >> > Elijah Newren <newren@gmail.com> writes: >> > >> >> Reviewed-by: Elijah Newren <newren@gmail.com> >> > >> > >> > Thanks for a quick review. >> >> Unfortunately Windows does not seem to correctly detect the aborting Sorry, I did not mean "abort(3)" literally. What I meant was that an external merge driver that gets spawned via the run_command() interface may not die by calling exit()---like "killed by signal" (including "segfaulting"). The new test script piece added in the patch did "kill -9 $$" to kill the external merge driver itself, which gets reported as "killed by signal" from run_command() by returning the signal number + 128, but that did not pass Windows CI.
On 6/27/2023 12:08 PM, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > >> On Fri, 23 Jun 2023, Junio C Hamano wrote: >> >>> Junio C Hamano <gitster@pobox.com> writes: >>> >>>> Elijah Newren <newren@gmail.com> writes: >>>> >>>>> Reviewed-by: Elijah Newren <newren@gmail.com> >>>> >>>> Thanks for a quick review. >>> Unfortunately Windows does not seem to correctly detect the aborting > Sorry, I did not mean "abort(3)" literally. What I meant was that > an external merge driver that gets spawned via the run_command() > interface may not die by calling exit()---like "killed by signal" > (including "segfaulting"). The new test script piece added in the > patch did "kill -9 $$" to kill the external merge driver itself, > which gets reported as "killed by signal" from run_command() by > returning the signal number + 128, but that did not pass Windows CI. > Do you need me to provide a windows test harness?
Joshua Hudson <jhudson@cedaron.com> writes: > On 6/27/2023 12:08 PM, Junio C Hamano wrote: >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> >>> On Fri, 23 Jun 2023, Junio C Hamano wrote: >>> >>>> Junio C Hamano <gitster@pobox.com> writes: >>>> >>>>> Elijah Newren <newren@gmail.com> writes: >>>>> >>>>>> Reviewed-by: Elijah Newren <newren@gmail.com> >>>>> >>>>> Thanks for a quick review. >>>> Unfortunately Windows does not seem to correctly detect the aborting >> Sorry, I did not mean "abort(3)" literally. What I meant was that >> an external merge driver that gets spawned via the run_command() >> interface may not die by calling exit()---like "killed by signal" >> (including "segfaulting"). The new test script piece added in the >> patch did "kill -9 $$" to kill the external merge driver itself, >> which gets reported as "killed by signal" from run_command() by >> returning the signal number + 128, but that did not pass Windows CI. >> > Do you need me to provide a windows test harness? Sorry, I do not understand the question. FWIW how "external merge driver that kills itself by sending a signal to itself does not get noticed on Windows" appears in our tests can be seen at https://github.com/git/git/actions/runs/5360824580/jobs/9727137272 The job is "win test(0)", part of our standard Windows test harness implemented as part of our GitHub Actions CI test. Thanks.
On 6/27/2023 1:04 PM, Junio C Hamano wrote: > Joshua Hudson <jhudson@cedaron.com> writes: > >> On 6/27/2023 12:08 PM, Junio C Hamano wrote: >>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >>> >>>> On Fri, 23 Jun 2023, Junio C Hamano wrote: >>>> >>>>> Junio C Hamano <gitster@pobox.com> writes: >>>>> >>>>>> Elijah Newren <newren@gmail.com> writes: >>>>>> >>>>>>> Reviewed-by: Elijah Newren <newren@gmail.com> >>>>>> Thanks for a quick review. >>>>> Unfortunately Windows does not seem to correctly detect the aborting >>> Sorry, I did not mean "abort(3)" literally. What I meant was that >>> an external merge driver that gets spawned via the run_command() >>> interface may not die by calling exit()---like "killed by signal" >>> (including "segfaulting"). The new test script piece added in the >>> patch did "kill -9 $$" to kill the external merge driver itself, >>> which gets reported as "killed by signal" from run_command() by >>> returning the signal number + 128, but that did not pass Windows CI. >>> >> Do you need me to provide a windows test harness? > Sorry, I do not understand the question. > > FWIW how "external merge driver that kills itself by sending a > signal to itself does not get noticed on Windows" appears in our > tests can be seen at > > https://github.com/git/git/actions/runs/5360824580/jobs/9727137272 > > The job is "win test(0)", part of our standard Windows test harness > implemented as part of our GitHub Actions CI test. > > Thanks. Try changing kill -9 $$ to exit 137 # 128 + 9
Joshua Hudson <jhudson@cedaron.com> writes:
> Try changing kill -9 $$ to exit 137 # 128 + 9
Yeah, but then (1) we are not simulating a case where the external
merge driver hits a segfault or receives a signal from outside and
dies involuntarily, and (2) we are codifying that even on Windows,
program that was killed by signal N must exit with 128 + N, and
these are the reasons why I did not go that route.
Stepping back a bit, how does one typically diagnose programatically
on Windows, after "spawning" a separate program, if the program died
involuntarily and/or got killed? It does not have to be "exit with
128 + signal number"---as long as it can be done programatically and
reliably, we would be happy. The code to diagnose how the spawned
program exited in run_command(), which is in finish_command() and
then in wait_or_whine(), may have to be updated with such a piece of
Windows specific knowledge.
On 6/27/2023 2:26 PM, Junio C Hamano wrote: > Joshua Hudson <jhudson@cedaron.com> writes: > >> Try changing kill -9 $$ to exit 137 # 128 + 9 > Yeah, but then (1) we are not simulating a case where the external > merge driver hits a segfault or receives a signal from outside and > dies involuntarily, and (2) we are codifying that even on Windows, > program that was killed by signal N must exit with 128 + N, and > these are the reasons why I did not go that route. > > Stepping back a bit, how does one typically diagnose programatically > on Windows, after "spawning" a separate program, if the program died > involuntarily and/or got killed? It does not have to be "exit with > 128 + signal number"---as long as it can be done programatically and > reliably, we would be happy. The code to diagnose how the spawned > program exited in run_command(), which is in finish_command() and > then in wait_or_whine(), may have to be updated with such a piece of > Windows specific knowledge. abort() => 3 Killed => no you can't detect it Faulted => exit code has the high bit set ( >= 0x8000000 ) My starting off with "the logical equivalent of calling abort()" has proven to be an unfortunate word choice. I need to harden up the exit pathway on my side _anyway_. An OOM at least does turn into Faulted.
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 02a3ec83e4..6deb89a296 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -1132,7 +1132,10 @@ size (see below). The merge driver is expected to leave the result of the merge in the file named with `%A` by overwriting it, and exit with zero status if it managed to merge them cleanly, or non-zero if there -were conflicts. +were conflicts. When the driver crashes (e.g. killed by SEGV), +it is expected to exit with non-zero status that are higher than +128, and in such a case, the merge results in a failure (which is +different from producing a conflict). The `merge.*.recursive` variable specifies what other merge driver to use when the merge driver is called for an internal diff --git a/ll-merge.c b/ll-merge.c index 07ec16e8e5..ba45aa2f79 100644 --- a/ll-merge.c +++ b/ll-merge.c @@ -243,7 +243,14 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn, unlink_or_warn(temp[i]); strbuf_release(&cmd); strbuf_release(&path_sq); - ret = (status > 0) ? LL_MERGE_CONFLICT : status; + + if (!status) + ret = LL_MERGE_OK; + else if (status <= 128) + ret = LL_MERGE_CONFLICT; + else + /* died due to a signal: WTERMSIG(status) + 128 */ + ret = LL_MERGE_ERROR; return ret; } diff --git a/t/t6406-merge-attr.sh b/t/t6406-merge-attr.sh index 5e4e4dd6d9..b50aedbc00 100755 --- a/t/t6406-merge-attr.sh +++ b/t/t6406-merge-attr.sh @@ -56,6 +56,12 @@ test_expect_success setup ' ) >"$ours+" cat "$ours+" >"$ours" rm -f "$ours+" + + if test -f ./please-abort + then + echo >>./please-abort killing myself + kill -9 $$ + fi exit "$exit" EOF chmod +x ./custom-merge @@ -162,6 +168,23 @@ test_expect_success 'custom merge backend' ' rm -f $o $a $b ' +test_expect_success 'custom merge driver that is killed with a signal' ' + test_when_finished "rm -f output please-abort" && + + git reset --hard anchor && + git config --replace-all \ + merge.custom.driver "./custom-merge %O %A %B 0 %P" && + git config --replace-all \ + merge.custom.name "custom merge driver for testing" && + + >./please-abort && + echo "* merge=custom" >.gitattributes && + test_must_fail git merge main && + git ls-files -u >output && + git diff --name-only HEAD >>output && + test_must_be_empty output +' + test_expect_success 'up-to-date merge without common ancestor' ' git init repo1 && git init repo2 &&
When an external merge driver dies with a signal, we should not expect that the result left on the filesystem is in any useful state. However, because the current code uses the return value from run_command() and declares any positive value as a sign that the driver successfully left conflicts in the result, and because the return value from run_command() for a subprocess that died upon a signal is positive, we end up treating whatever garbage left on the filesystem as the result the merge driver wanted to leave us. run_command() returns larger than 128 (WTERMSIG(status) + 128, to be exact) when it notices that the subprocess died with a signal, so detect such a case and return LL_MERGE_ERROR from ll_ext_merge(). Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * This time with an updated title, a minimal documentation, and an additional test. Documentation/gitattributes.txt | 5 ++++- ll-merge.c | 9 ++++++++- t/t6406-merge-attr.sh | 23 +++++++++++++++++++++++ 3 files changed, 35 insertions(+), 2 deletions(-)