diff mbox series

ll-merge: killing the external merge driver aborts the merge

Message ID xmqq4jmzc91e.fsf_-_@gitster.g (mailing list archive)
State Accepted
Commit 2b7b788fb31a74bcbff4e4c6efc6f3db6c3a49b7
Headers show
Series ll-merge: killing the external merge driver aborts the merge | expand

Commit Message

Junio C Hamano June 23, 2023, 12:33 a.m. UTC
When an external merge driver dies with a signal, we should not
expect that the result left on the filesystem is in any useful
state.  However, because the current code uses the return value from
run_command() and declares any positive value as a sign that the
driver successfully left conflicts in the result, and because the
return value from run_command() for a subprocess that died upon a
signal is positive, we end up treating whatever garbage left on the
filesystem as the result the merge driver wanted to leave us.

run_command() returns larger than 128 (WTERMSIG(status) + 128, to be
exact) when it notices that the subprocess died with a signal, so
detect such a case and return LL_MERGE_ERROR from ll_ext_merge().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * This time with an updated title, a minimal documentation, and an
   additional test.

 Documentation/gitattributes.txt |  5 ++++-
 ll-merge.c                      |  9 ++++++++-
 t/t6406-merge-attr.sh           | 23 +++++++++++++++++++++++
 3 files changed, 35 insertions(+), 2 deletions(-)

Comments

Elijah Newren June 23, 2023, 6:25 a.m. UTC | #1
On Thu, Jun 22, 2023 at 5:33 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> When an external merge driver dies with a signal, we should not
> expect that the result left on the filesystem is in any useful
> state.  However, because the current code uses the return value from
> run_command() and declares any positive value as a sign that the
> driver successfully left conflicts in the result, and because the
> return value from run_command() for a subprocess that died upon a
> signal is positive, we end up treating whatever garbage left on the
> filesystem as the result the merge driver wanted to leave us.

Yeah, I think the tradition was exit code == number of conflicts for
some merge processes.  Not particularly useful when the driver died
from some signal.

> run_command() returns larger than 128 (WTERMSIG(status) + 128, to be
> exact) when it notices that the subprocess died with a signal, so
> detect such a case and return LL_MERGE_ERROR from ll_ext_merge().

Makes sense.

>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>
>  * This time with an updated title, a minimal documentation, and an
>    additional test.
>
>  Documentation/gitattributes.txt |  5 ++++-
>  ll-merge.c                      |  9 ++++++++-
>  t/t6406-merge-attr.sh           | 23 +++++++++++++++++++++++
>  3 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 02a3ec83e4..6deb89a296 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -1132,7 +1132,10 @@ size (see below).
>  The merge driver is expected to leave the result of the merge in
>  the file named with `%A` by overwriting it, and exit with zero
>  status if it managed to merge them cleanly, or non-zero if there
> -were conflicts.
> +were conflicts.  When the driver crashes (e.g. killed by SEGV),
> +it is expected to exit with non-zero status that are higher than
> +128, and in such a case, the merge results in a failure (which is
> +different from producing a conflict).

Looks good.

>  The `merge.*.recursive` variable specifies what other merge
>  driver to use when the merge driver is called for an internal
> diff --git a/ll-merge.c b/ll-merge.c
> index 07ec16e8e5..ba45aa2f79 100644
> --- a/ll-merge.c
> +++ b/ll-merge.c
> @@ -243,7 +243,14 @@ static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
>                 unlink_or_warn(temp[i]);
>         strbuf_release(&cmd);
>         strbuf_release(&path_sq);
> -       ret = (status > 0) ? LL_MERGE_CONFLICT : status;
> +
> +       if (!status)
> +               ret = LL_MERGE_OK;
> +       else if (status <= 128)
> +               ret = LL_MERGE_CONFLICT;
> +       else
> +               /* died due to a signal: WTERMSIG(status) + 128 */
> +               ret = LL_MERGE_ERROR;
>         return ret;
>  }

Likewise.

> diff --git a/t/t6406-merge-attr.sh b/t/t6406-merge-attr.sh
> index 5e4e4dd6d9..b50aedbc00 100755
> --- a/t/t6406-merge-attr.sh
> +++ b/t/t6406-merge-attr.sh
> @@ -56,6 +56,12 @@ test_expect_success setup '
>         ) >"$ours+"
>         cat "$ours+" >"$ours"
>         rm -f "$ours+"
> +
> +       if test -f ./please-abort
> +       then
> +               echo >>./please-abort killing myself
> +               kill -9 $$
> +       fi
>         exit "$exit"
>         EOF
>         chmod +x ./custom-merge
> @@ -162,6 +168,23 @@ test_expect_success 'custom merge backend' '
>         rm -f $o $a $b
>  '
>
> +test_expect_success 'custom merge driver that is killed with a signal' '
> +       test_when_finished "rm -f output please-abort" &&
> +
> +       git reset --hard anchor &&
> +       git config --replace-all \
> +       merge.custom.driver "./custom-merge %O %A %B 0 %P" &&
> +       git config --replace-all \
> +       merge.custom.name "custom merge driver for testing" &&
> +
> +       >./please-abort &&
> +       echo "* merge=custom" >.gitattributes &&
> +       test_must_fail git merge main &&
> +       git ls-files -u >output &&
> +       git diff --name-only HEAD >>output &&
> +       test_must_be_empty output
> +'
> +

I was about to comment that we needed to clean up the please-abort
file, then realized I just missed it in my first reading of the
test_when_finished line.  So, patch looks good.

Reviewed-by: Elijah Newren <newren@gmail.com>
Junio C Hamano June 23, 2023, 4:26 p.m. UTC | #2
Elijah Newren <newren@gmail.com> writes:

> Reviewed-by: Elijah Newren <newren@gmail.com>


Thanks for a quick review.
Junio C Hamano June 23, 2023, 11:31 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> Elijah Newren <newren@gmail.com> writes:
>
>> Reviewed-by: Elijah Newren <newren@gmail.com>
>
>
> Thanks for a quick review.

Unfortunately Windows does not seem to correctly detect the aborting
merge driver.  Does run_command() there report process death due to
signals differently, I wonder?

https://github.com/git/git/actions/runs/5360400800/jobs/9725341775#step:6:285

shows that on Windows, aborted external merge driver is not noticed
and we happily take the auto-merged result, ouch.

I am tempted to protect this step of the test with a prerequisite to
skip it on Windows for now.  Anybody with better idea?

Thanks.
Joshua Hudson June 25, 2023, 1:40 p.m. UTC | #4
On 6/23/2023 4:31 PM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Elijah Newren <newren@gmail.com> writes:
>>
>>> Reviewed-by: Elijah Newren <newren@gmail.com>
>>
>> Thanks for a quick review.
> Unfortunately Windows does not seem to correctly detect the aborting
> merge driver.  Does run_command() there report process death due to
> signals differently, I wonder?
>
> https://github.com/git/git/actions/runs/5360400800/jobs/9725341775#step:6:285
>
> shows that on Windows, aborted external merge driver is not noticed
> and we happily take the auto-merged result, ouch.
>
> I am tempted to protect this step of the test with a prerequisite to
> skip it on Windows for now.  Anybody with better idea?
>
> Thanks.

I would suggest putting in the correct test harness on Windows. abort() 
doesn't work very well.

(Sample code only--read description below)

#ifdef WIN32
     TerminateProcess(GetCurrentProcess(), 131); /* something that looks 
like it passes the test */
     TerminateProcess(GetCurrentProcess(), 0x80070485); /* actual exit 
code for process that cannot start because its missing a shared library */
#else
     abort();
#endif

I only want to fight so hard with the Unix to Windows translation layer 
you use. Strictly speaking, the second TerminateProcess() line is what 
should be in the test harness, but if it doesn't work go with the first 
one. Then I at least have something to work with.

I'm not going to lie to you. We are doing development on Windows, and 
the merge driver is written using a different portability layer. I am 
prepared to build a native shim to make it work if I have to. I am not 
in a position where I can build git and test any development code; 
getting the fix to us will be a long journey.
Johannes Schindelin June 27, 2023, 12:02 p.m. UTC | #5
Hi,


On Fri, 23 Jun 2023, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
> > Elijah Newren <newren@gmail.com> writes:
> >
> >> Reviewed-by: Elijah Newren <newren@gmail.com>
> >
> >
> > Thanks for a quick review.
>
> Unfortunately Windows does not seem to correctly detect the aborting
> merge driver.  Does run_command() there report process death due to
> signals differently, I wonder?
>
> https://github.com/git/git/actions/runs/5360400800/jobs/9725341775#step:6:285
>
> shows that on Windows, aborted external merge driver is not noticed
> and we happily take the auto-merged result, ouch.

Hmm. I tried to verify this, but failed. With this patch:

```diff
diff --git a/git.c b/git.c
index 2f42da20f4e0..3c513e3f2cb1 100644
--- a/git.c
+++ b/git.c
@@ -330,6 +330,8 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
 			setenv(GIT_ATTR_SOURCE_ENVIRONMENT, cmd, 1);
 			if (envchanged)
 				*envchanged = 1;
+		} else if (!strcmp(cmd, "--abort")) {
+			abort();
 		} else {
 			fprintf(stderr, _("unknown option: %s\n"), cmd);
 			usage(git_usage_string);
```

I get this:


```console
$ ./git.exe --abort

$ echo $?
3
```

For that reason, I am somehow doubtful that the `abort()` is actually
called?!?

Ciao,
Johannes
Joshua Hudson June 27, 2023, 1:43 p.m. UTC | #6
On 6/27/2023 5:02 AM, Johannes Schindelin wrote:
> Hi,
>
>
> On Fri, 23 Jun 2023, Junio C Hamano wrote:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> Elijah Newren <newren@gmail.com> writes:
>>>
>>>> Reviewed-by: Elijah Newren <newren@gmail.com>
>>>
>>> Thanks for a quick review.
>> Unfortunately Windows does not seem to correctly detect the aborting
>> merge driver.  Does run_command() there report process death due to
>> signals differently, I wonder?
>>
>> https://github.com/git/git/actions/runs/5360400800/jobs/9725341775#step:6:285
>>
>> shows that on Windows, aborted external merge driver is not noticed
>> and we happily take the auto-merged result, ouch.
> Hmm. I tried to verify this, but failed. With this patch:
>
> ```diff
> diff --git a/git.c b/git.c
> index 2f42da20f4e0..3c513e3f2cb1 100644
> --- a/git.c
> +++ b/git.c
> @@ -330,6 +330,8 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
>   			setenv(GIT_ATTR_SOURCE_ENVIRONMENT, cmd, 1);
>   			if (envchanged)
>   				*envchanged = 1;
> +		} else if (!strcmp(cmd, "--abort")) {
> +			abort();
>   		} else {
>   			fprintf(stderr, _("unknown option: %s\n"), cmd);
>   			usage(git_usage_string);
> ```
>
> I get this:
>
>
> ```console
> $ ./git.exe --abort
>
> $ echo $?
> 3
> ```
>
> For that reason, I am somehow doubtful that the `abort()` is actually
> called?!?
>
> Ciao,
> Johannes

abort(); does _exit(3); on Windows because there are no signals. This is 
easily changed by providing abort like so:

void abort() { _exit(131 /* or whatever else you think goes here */); }
Junio C Hamano June 27, 2023, 7:08 p.m. UTC | #7
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Fri, 23 Jun 2023, Junio C Hamano wrote:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>> > Elijah Newren <newren@gmail.com> writes:
>> >
>> >> Reviewed-by: Elijah Newren <newren@gmail.com>
>> >
>> >
>> > Thanks for a quick review.
>>
>> Unfortunately Windows does not seem to correctly detect the aborting

Sorry, I did not mean "abort(3)" literally.  What I meant was that
an external merge driver that gets spawned via the run_command()
interface may not die by calling exit()---like "killed by signal"
(including "segfaulting").  The new test script piece added in the
patch did "kill -9 $$" to kill the external merge driver itself,
which gets reported as "killed by signal" from run_command() by
returning the signal number + 128, but that did not pass Windows CI.
Joshua Hudson June 27, 2023, 7:10 p.m. UTC | #8
On 6/27/2023 12:08 PM, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> On Fri, 23 Jun 2023, Junio C Hamano wrote:
>>
>>> Junio C Hamano <gitster@pobox.com> writes:
>>>
>>>> Elijah Newren <newren@gmail.com> writes:
>>>>
>>>>> Reviewed-by: Elijah Newren <newren@gmail.com>
>>>>
>>>> Thanks for a quick review.
>>> Unfortunately Windows does not seem to correctly detect the aborting
> Sorry, I did not mean "abort(3)" literally.  What I meant was that
> an external merge driver that gets spawned via the run_command()
> interface may not die by calling exit()---like "killed by signal"
> (including "segfaulting").  The new test script piece added in the
> patch did "kill -9 $$" to kill the external merge driver itself,
> which gets reported as "killed by signal" from run_command() by
> returning the signal number + 128, but that did not pass Windows CI.
>
Do you need me to provide a windows test harness?
Junio C Hamano June 27, 2023, 8:04 p.m. UTC | #9
Joshua Hudson <jhudson@cedaron.com> writes:

> On 6/27/2023 12:08 PM, Junio C Hamano wrote:
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>>> On Fri, 23 Jun 2023, Junio C Hamano wrote:
>>>
>>>> Junio C Hamano <gitster@pobox.com> writes:
>>>>
>>>>> Elijah Newren <newren@gmail.com> writes:
>>>>>
>>>>>> Reviewed-by: Elijah Newren <newren@gmail.com>
>>>>>
>>>>> Thanks for a quick review.
>>>> Unfortunately Windows does not seem to correctly detect the aborting
>> Sorry, I did not mean "abort(3)" literally.  What I meant was that
>> an external merge driver that gets spawned via the run_command()
>> interface may not die by calling exit()---like "killed by signal"
>> (including "segfaulting").  The new test script piece added in the
>> patch did "kill -9 $$" to kill the external merge driver itself,
>> which gets reported as "killed by signal" from run_command() by
>> returning the signal number + 128, but that did not pass Windows CI.
>>
> Do you need me to provide a windows test harness?

Sorry, I do not understand the question.

FWIW how "external merge driver that kills itself by sending a
signal to itself does not get noticed on Windows" appears in our
tests can be seen at

https://github.com/git/git/actions/runs/5360824580/jobs/9727137272

The job is "win test(0)", part of our standard Windows test harness
implemented as part of our GitHub Actions CI test.

Thanks.
Joshua Hudson June 27, 2023, 9:14 p.m. UTC | #10
On 6/27/2023 1:04 PM, Junio C Hamano wrote:
> Joshua Hudson <jhudson@cedaron.com> writes:
>
>> On 6/27/2023 12:08 PM, Junio C Hamano wrote:
>>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>>
>>>> On Fri, 23 Jun 2023, Junio C Hamano wrote:
>>>>
>>>>> Junio C Hamano <gitster@pobox.com> writes:
>>>>>
>>>>>> Elijah Newren <newren@gmail.com> writes:
>>>>>>
>>>>>>> Reviewed-by: Elijah Newren <newren@gmail.com>
>>>>>> Thanks for a quick review.
>>>>> Unfortunately Windows does not seem to correctly detect the aborting
>>> Sorry, I did not mean "abort(3)" literally.  What I meant was that
>>> an external merge driver that gets spawned via the run_command()
>>> interface may not die by calling exit()---like "killed by signal"
>>> (including "segfaulting").  The new test script piece added in the
>>> patch did "kill -9 $$" to kill the external merge driver itself,
>>> which gets reported as "killed by signal" from run_command() by
>>> returning the signal number + 128, but that did not pass Windows CI.
>>>
>> Do you need me to provide a windows test harness?
> Sorry, I do not understand the question.
>
> FWIW how "external merge driver that kills itself by sending a
> signal to itself does not get noticed on Windows" appears in our
> tests can be seen at
>
> https://github.com/git/git/actions/runs/5360824580/jobs/9727137272
>
> The job is "win test(0)", part of our standard Windows test harness
> implemented as part of our GitHub Actions CI test.
>
> Thanks.
Try changing kill -9 $$ to exit 137 # 128 + 9
Junio C Hamano June 27, 2023, 9:26 p.m. UTC | #11
Joshua Hudson <jhudson@cedaron.com> writes:

> Try changing kill -9 $$ to exit 137 # 128 + 9

Yeah, but then (1) we are not simulating a case where the external
merge driver hits a segfault or receives a signal from outside and
dies involuntarily, and (2) we are codifying that even on Windows,
program that was killed by signal N must exit with 128 + N, and
these are the reasons why I did not go that route.

Stepping back a bit, how does one typically diagnose programatically
on Windows, after "spawning" a separate program, if the program died
involuntarily and/or got killed?  It does not have to be "exit with
128 + signal number"---as long as it can be done programatically and
reliably, we would be happy.  The code to diagnose how the spawned
program exited in run_command(), which is in finish_command() and
then in wait_or_whine(), may have to be updated with such a piece of
Windows specific knowledge.
Joshua Hudson June 27, 2023, 9:32 p.m. UTC | #12
On 6/27/2023 2:26 PM, Junio C Hamano wrote:
> Joshua Hudson <jhudson@cedaron.com> writes:
>
>> Try changing kill -9 $$ to exit 137 # 128 + 9
> Yeah, but then (1) we are not simulating a case where the external
> merge driver hits a segfault or receives a signal from outside and
> dies involuntarily, and (2) we are codifying that even on Windows,
> program that was killed by signal N must exit with 128 + N, and
> these are the reasons why I did not go that route.
>
> Stepping back a bit, how does one typically diagnose programatically
> on Windows, after "spawning" a separate program, if the program died
> involuntarily and/or got killed?  It does not have to be "exit with
> 128 + signal number"---as long as it can be done programatically and
> reliably, we would be happy.  The code to diagnose how the spawned
> program exited in run_command(), which is in finish_command() and
> then in wait_or_whine(), may have to be updated with such a piece of
> Windows specific knowledge.

abort() => 3

Killed => no you can't detect it

Faulted => exit code has the high bit set ( >= 0x8000000 )

My starting off with "the logical equivalent of calling abort()" has 
proven to be an unfortunate word choice. I need to harden up the exit 
pathway on my side _anyway_. An OOM at least does turn into Faulted.
diff mbox series

Patch

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 02a3ec83e4..6deb89a296 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -1132,7 +1132,10 @@  size (see below).
 The merge driver is expected to leave the result of the merge in
 the file named with `%A` by overwriting it, and exit with zero
 status if it managed to merge them cleanly, or non-zero if there
-were conflicts.
+were conflicts.  When the driver crashes (e.g. killed by SEGV),
+it is expected to exit with non-zero status that are higher than
+128, and in such a case, the merge results in a failure (which is
+different from producing a conflict).
 
 The `merge.*.recursive` variable specifies what other merge
 driver to use when the merge driver is called for an internal
diff --git a/ll-merge.c b/ll-merge.c
index 07ec16e8e5..ba45aa2f79 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -243,7 +243,14 @@  static enum ll_merge_result ll_ext_merge(const struct ll_merge_driver *fn,
 		unlink_or_warn(temp[i]);
 	strbuf_release(&cmd);
 	strbuf_release(&path_sq);
-	ret = (status > 0) ? LL_MERGE_CONFLICT : status;
+
+	if (!status)
+		ret = LL_MERGE_OK;
+	else if (status <= 128)
+		ret = LL_MERGE_CONFLICT;
+	else
+		/* died due to a signal: WTERMSIG(status) + 128 */
+		ret = LL_MERGE_ERROR;
 	return ret;
 }
 
diff --git a/t/t6406-merge-attr.sh b/t/t6406-merge-attr.sh
index 5e4e4dd6d9..b50aedbc00 100755
--- a/t/t6406-merge-attr.sh
+++ b/t/t6406-merge-attr.sh
@@ -56,6 +56,12 @@  test_expect_success setup '
 	) >"$ours+"
 	cat "$ours+" >"$ours"
 	rm -f "$ours+"
+
+	if test -f ./please-abort
+	then
+		echo >>./please-abort killing myself
+		kill -9 $$
+	fi
 	exit "$exit"
 	EOF
 	chmod +x ./custom-merge
@@ -162,6 +168,23 @@  test_expect_success 'custom merge backend' '
 	rm -f $o $a $b
 '
 
+test_expect_success 'custom merge driver that is killed with a signal' '
+	test_when_finished "rm -f output please-abort" &&
+
+	git reset --hard anchor &&
+	git config --replace-all \
+	merge.custom.driver "./custom-merge %O %A %B 0 %P" &&
+	git config --replace-all \
+	merge.custom.name "custom merge driver for testing" &&
+
+	>./please-abort &&
+	echo "* merge=custom" >.gitattributes &&
+	test_must_fail git merge main &&
+	git ls-files -u >output &&
+	git diff --name-only HEAD >>output &&
+	test_must_be_empty output
+'
+
 test_expect_success 'up-to-date merge without common ancestor' '
 	git init repo1 &&
 	git init repo2 &&