diff mbox series

t7006: clean up SIGPIPE handling in trace2 tests

Message ID YZrOLy03s5ZWMQ+t@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit f7991f01f2d1bc800a47adcf66a1b29a1f7cf697
Headers show
Series t7006: clean up SIGPIPE handling in trace2 tests | expand

Commit Message

Jeff King Nov. 21, 2021, 10:54 p.m. UTC
On Sun, Nov 21, 2021 at 05:05:12PM -0500, Jeff King wrote:

> On Sun, Nov 21, 2021 at 01:40:02PM -0500, Jeff King wrote:
> 
> > It seems like this thread stalled. Ævar, were you planning to fix these
> > tests?
> > 
> > It's not too hard to swap out a "yes" as I showed above, but it further
> > confuses the trace2 output, because now we have the child yes (or its
> > shell equivalent) exiting as well.
> > [...]
> 
> I think it also probably doesn't test what the patch adding these tests
> was interested in. Specifically, it's trying to check the case of
> reporting the pager exit code while we're in a signal handler. So we
> have to get the SIGPIPE in the main git process to trigger that. Which
> means it has to be a builtin.
> 
> So you really do need a builtin that produces infinite output. Or at the
> very least, enough that we expect it to fill any possible OS pipe
> buffer. I guess the latter is actually not that hard. A 1MB
> commit-message would be plenty (or if you prefer, a big file with "log
> -p").

That's actually pretty easy to do. So I came up with the patch below, in
the interests of wrapping up this thread.

I'm not 100% sure this fixes any possible races, as the race Junio
initially reported seemed to be in the "propagated signals from pager"
test, which I don't think has these flaky-SIGPIPE problems. But I think
it's at least correcting some of the confusion. And we can see if it
happens again (I haven't been able to trigger any failures with --stress
myself).

-- >8 --
Subject: [PATCH] t7006: clean up SIGPIPE handling in trace2 tests

Comit c24b7f6736 (pager: test for exit code with and without SIGPIPE,
2021-02-02) introduced some tests that don't reliably generate SIGPIPE
where we expect it (i.e., when our pager doesn't read all of the output
from git-log).

There are two problems that somewhat cancel each other out.

First is that the output of git-log isn't very large (only around 800
bytes). So even if the pager doesn't read all of our output, it's racy
whether or not we'll actually get a SIGPIPE (we won't if we write all of
the output into the pipe buffer before the pager exits).

But we wrap git-log with test_terminal, which is supposed to propagate
the exit status of git-log. However, it doesn't always do so;
test_terminal will copy to stdout any lines that it got from our fake
pager, and it pipes to an empty command. So most of the time we are
seeing a SIGPIPE from test_terminal itself (though this is likewise
racy).

Let's try to make this more robust in two ways:

  1. We'll put a commit with a huge message at the tip of history. Since
     this is over a megabyte, it should fill the OS pipe buffer
     completely, causing git-log to keep trying to write even after the
     pager has exited.

  2. We'll redirect the output of test_terminal to /dev/null. That means
     it can never get SIGPIPE itself, and will always be giving us the
     exit code from git-log.

These two changes reveal that one of the tests was looking for the wrong
behavior. If we try to start a pager that does not exist (according to
execve()), then the error propagates from start_command() back to the
pager code as an error, and we avoid redirecting git-log's stdout to the
broken pager entirely.  Instead, it goes straight to the original stdout
(test_terminal's pty in this case), and we do not see a SIGPIPE at all.

So the test "git attempts to page to nonexisting pager command, gets
SIGPIPE" is checking the wrong outcome; it should be looking for a
successful exit (and was only confused by test_terminal's SIGPIPE).

There's a related test, "git discards nonexisting pager without
SIGPIPE", which sets the pager to a shell command which will read all
input and _then_ run a non-existing command. But that doesn't trigger
the same execve() behavior. We really do run the shell there and
redirect git-log's stdout to it. And the fact that the shell then exits
127 is not interesting. It is not different at that point than the
earlier test to check for "exit 1". So we can drop that test entirely.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7006-pager.sh | 42 ++++++++++++++----------------------------
 1 file changed, 14 insertions(+), 28 deletions(-)

Comments

Jeff King Nov. 21, 2021, 11:10 p.m. UTC | #1
On Sun, Nov 21, 2021 at 05:54:39PM -0500, Jeff King wrote:

> That's actually pretty easy to do. So I came up with the patch below, in
> the interests of wrapping up this thread.
> 
> I'm not 100% sure this fixes any possible races, as the race Junio
> initially reported seemed to be in the "propagated signals from pager"
> test, which I don't think has these flaky-SIGPIPE problems. But I think
> it's at least correcting some of the confusion. And we can see if it
> happens again (I haven't been able to trigger any failures with --stress
> myself).
> 
> -- >8 --
> Subject: [PATCH] t7006: clean up SIGPIPE handling in trace2 tests

This one is optional on top. It's just cleanup, but IMHO makes it easier
to understand what's going on.

-- >8 --
Subject: [PATCH] t7006: simplify exit-code checks for sigpipe tests

Some tests in t7006 check for a SIGPIPE result by recording $? and
comparing it with test_match_signal. Before the previous commit, the
command was on the left-hand side of a pipe, and so we had to do some
subshell trickery to extract it.

But now that this is no longer the case, we can do things much more
simply: just run the command directly, using braces to avoid wrecking
the &&-chain, and then record $?. We could almost use test_expect_code
here, but it doesn't know about test_match_signal.

Likewise, for tests which expect success (i.e., not SIGPIPE), we can
just put them in the &&-chain as usual. That even lets us get rid of the
!MINGW check, since the expectation is the same on both sides.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7006-pager.sh | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh
index 577e51b47a..851961c798 100755
--- a/t/t7006-pager.sh
+++ b/t/t7006-pager.sh
@@ -677,7 +677,7 @@ test_expect_success TTY 'git returns SIGPIPE on early pager exit' '
 
 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
+		{ test_terminal git log >/dev/null; OUT=$?; } &&
 		test_match_signal 13 "$OUT"
 	else
 		test_terminal git log
@@ -698,7 +698,7 @@ test_expect_success TTY 'git returns SIGPIPE on early pager non-zero exit' '
 
 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
+		{ test_terminal git log >/dev/null; OUT=$?; } &&
 		test_match_signal 13 "$OUT"
 	else
 		test_terminal git log
@@ -717,13 +717,7 @@ test_expect_success TTY 'git discards pager non-zero exit without SIGPIPE' '
 	export GIT_TRACE2 &&
 	test_when_finished "unset GIT_TRACE2" &&
 
-	if test_have_prereq !MINGW
-	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
-		test "$OUT" -eq 0
-	else
-		test_terminal git log
-	fi &&
+	test_terminal git log &&
 
 	grep child_exit trace.normal >child-exits &&
 	test_line_count = 1 child-exits &&
@@ -738,13 +732,7 @@ test_expect_success TTY 'git skips paging nonexisting command' '
 	export GIT_TRACE2 &&
 	test_when_finished "unset GIT_TRACE2" &&
 
-	if test_have_prereq !MINGW
-	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
-		test "$OUT" -eq 0
-	else
-		test_terminal git log
-	fi &&
+	test_terminal git log &&
 
 	grep child_exit trace.normal >child-exits &&
 	test_line_count = 1 child-exits &&
@@ -760,7 +748,7 @@ test_expect_success TTY 'git returns SIGPIPE on propagated signals from pager' '
 
 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
+		{ test_terminal git log >/dev/null; OUT=$?; } &&
 		test_match_signal 13 "$OUT"
 	else
 		test_terminal git log
Junio C Hamano Nov. 22, 2021, 2:17 a.m. UTC | #2
Jeff King <peff@peff.net> writes:

> I'm not 100% sure this fixes any possible races, as the race Junio
> initially reported seemed to be in the "propagated signals from pager"
> test, which I don't think has these flaky-SIGPIPE problems. But I think
> it's at least correcting some of the confusion. And we can see if it
> happens again (I haven't been able to trigger any failures with --stress
> myself).

Applying this (or this and the follow-up) seems to make t7006, which
used to be flaky, to consistently fail at test "git returns SIGPIPE
on propagated signals from pager" for me ;-)
Jeff King Nov. 22, 2021, 4:51 a.m. UTC | #3
On Sun, Nov 21, 2021 at 06:17:04PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I'm not 100% sure this fixes any possible races, as the race Junio
> > initially reported seemed to be in the "propagated signals from pager"
> > test, which I don't think has these flaky-SIGPIPE problems. But I think
> > it's at least correcting some of the confusion. And we can see if it
> > happens again (I haven't been able to trigger any failures with --stress
> > myself).
> 
> Applying this (or this and the follow-up) seems to make t7006, which
> used to be flaky, to consistently fail at test "git returns SIGPIPE
> on propagated signals from pager" for me ;-)

Well, I guess it's good that we made things more consistent. :) It is
curious that you get failures and I don't, though. I wonder what the
difference is.

One curiosity is that the test does this:

	test_config core.pager ">pager-used; test-tool sigchain"

While "test-tool sigchain" will die with SIGTERM, it's the shell itself
which will waitpid() on it. And so in the end, what Git will generally
see is the same as if the shell had done "exit 143".

I wonder if the difference is between our shells. I know from previous
experience that bash will sometimes directly exec the final command in a
"-c" command, as an optimization. I don't get any difference running the
test with dash or bash, but that makes sense; the pager command is run
internally by Git via "sh -c".

Aha, that's it. If I recompile with SHELL_PATH=/bin/bash, then I see a
failure. Likewise, if I change the test like this:

diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh
index 851961c798..a87ef37803 100755
--- a/t/t7006-pager.sh
+++ b/t/t7006-pager.sh
@@ -741,7 +741,7 @@ test_expect_success TTY 'git skips paging nonexisting command' '
 
 test_expect_success TTY 'git returns SIGPIPE on propagated signals from pager' '
 	test_when_finished "rm pager-used trace.normal" &&
-	test_config core.pager ">pager-used; test-tool sigchain" &&
+	test_config core.pager ">pager-used; exec test-tool sigchain" &&
 	GIT_TRACE2="$(pwd)/trace.normal" &&
 	export GIT_TRACE2 &&
 	test_when_finished "unset GIT_TRACE2" &&

then it fails even with dash. And that is, I think, closer to what the
test was actually trying to cover (since checking a shell's "exit 143"
is really no different than "exit 1", and we checked that earlier).

So why is it failing? It looks like trace2 reports this as code "-1"
rather than 143. I think that is because the fix from be8fc53e36 (pager:
properly log pager exit code when signalled, 2021-02-02) is incomplete.
It sets WEXITSTATUS() if the pager exited, but it doesn't handle signal
death at all. I think it needs:

diff --git a/run-command.c b/run-command.c
index f40df01c77..ef9d1d4236 100644
--- a/run-command.c
+++ b/run-command.c
@@ -555,6 +555,8 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
 	if (in_signal) {
 		if (WIFEXITED(status))
 			code = WEXITSTATUS(status);
+		else if (WIFSIGNALED(status))
+			code = 128 + WTERMSIG(status); /* see comment below */
 		return code;
 	}
 

-Peff
Jeff King Nov. 22, 2021, 4:54 a.m. UTC | #4
On Sun, Nov 21, 2021 at 11:51:54PM -0500, Jeff King wrote:

> So why is it failing? It looks like trace2 reports this as code "-1"
> rather than 143. I think that is because the fix from be8fc53e36 (pager:
> properly log pager exit code when signalled, 2021-02-02) is incomplete.
> It sets WEXITSTATUS() if the pager exited, but it doesn't handle signal
> death at all. I think it needs:
> 
> diff --git a/run-command.c b/run-command.c
> index f40df01c77..ef9d1d4236 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -555,6 +555,8 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
>  	if (in_signal) {
>  		if (WIFEXITED(status))
>  			code = WEXITSTATUS(status);
> +		else if (WIFSIGNALED(status))
> +			code = 128 + WTERMSIG(status); /* see comment below */
>  		return code;
>  	}

I'm tempted to say that this would be clearer if the in_signal code path
just followed the main logic, like this:

diff --git a/run-command.c b/run-command.c
index f40df01c77..1f58c17b6c 100644
--- a/run-command.c
+++ b/run-command.c
@@ -552,20 +552,17 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
 
 	while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
 		;	/* nothing */
-	if (in_signal) {
-		if (WIFEXITED(status))
-			code = WEXITSTATUS(status);
-		return code;
-	}
 
 	if (waiting < 0) {
 		failed_errno = errno;
-		error_errno("waitpid for %s failed", argv0);
+		if (!in_signal)
+			error_errno("waitpid for %s failed", argv0);
 	} else if (waiting != pid) {
-		error("waitpid is confused (%s)", argv0);
+		if (!in_signal)
+			error("waitpid is confused (%s)", argv0);
 	} else if (WIFSIGNALED(status)) {
 		code = WTERMSIG(status);
-		if (code != SIGINT && code != SIGQUIT && code != SIGPIPE)
+		if (!in_signal && code != SIGINT && code != SIGQUIT && code != SIGPIPE)
 			error("%s died of signal %d", argv0, code);
 		/*
 		 * This return value is chosen so that code & 0xff
@@ -576,10 +573,12 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
 	} else if (WIFEXITED(status)) {
 		code = WEXITSTATUS(status);
 	} else {
-		error("waitpid is confused (%s)", argv0);
+		if (!in_signal)
+			error("waitpid is confused (%s)", argv0);
 	}
 
-	clear_child_for_cleanup(pid);
+	if (!in_signal)
+		clear_child_for_cleanup(pid);
 
 	errno = failed_errno;
 	return code;

That's a lot more tedious "if (!in_signal)" checks, but:

  - we don't have to duplicate any of the actual application logic

  - we'd now cover the extra cases for waitpid failing or returning the
    wrong pid (previously if waitpid() failed we'd still look at status,
    which could contain complete garbage!)

-Peff
Junio C Hamano Nov. 22, 2021, 5:49 a.m. UTC | #5
Jeff King <peff@peff.net> writes:

> I'm tempted to say that this would be clearer if the in_signal code path
> just followed the main logic, like this:
>
> diff --git a/run-command.c b/run-command.c
> index f40df01c77..1f58c17b6c 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -552,20 +552,17 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
>  
>  	while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
>  		;	/* nothing */
> -	if (in_signal) {
> -		if (WIFEXITED(status))
> -			code = WEXITSTATUS(status);
> -		return code;
> -	}
>  
>  	if (waiting < 0) {
>  		failed_errno = errno;
> -		error_errno("waitpid for %s failed", argv0);
> +		if (!in_signal)
> +			error_errno("waitpid for %s failed", argv0);
>  	} else if (waiting != pid) {
> -		error("waitpid is confused (%s)", argv0);
> +		if (!in_signal)
> +			error("waitpid is confused (%s)", argv0);
>  	} else if (WIFSIGNALED(status)) {
>  		code = WTERMSIG(status);
> -		if (code != SIGINT && code != SIGQUIT && code != SIGPIPE)
> +		if (!in_signal && code != SIGINT && code != SIGQUIT && code != SIGPIPE)
>  			error("%s died of signal %d", argv0, code);
>  		/*
>  		 * This return value is chosen so that code & 0xff
> @@ -576,10 +573,12 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
>  	} else if (WIFEXITED(status)) {
>  		code = WEXITSTATUS(status);
>  	} else {
> -		error("waitpid is confused (%s)", argv0);
> +		if (!in_signal)
> +			error("waitpid is confused (%s)", argv0);
>  	}
>  
> -	clear_child_for_cleanup(pid);
> +	if (!in_signal)
> +		clear_child_for_cleanup(pid);
>  
>  	errno = failed_errno;
>  	return code;
>
> That's a lot more tedious "if (!in_signal)" checks, but:
>
>   - we don't have to duplicate any of the actual application logic
>
>   - we'd now cover the extra cases for waitpid failing or returning the
>     wrong pid (previously if waitpid() failed we'd still look at status,
>     which could contain complete garbage!)

Yeah, the repeated "if (!in_signal)" look a bit ugly, but fixing
that "we only deal with ifexited in in_signal case" to do the right
thing would make the code even more annoying and harder to maintain.
Junio C Hamano Nov. 22, 2021, 6:05 a.m. UTC | #6
Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
> ...
>> That's a lot more tedious "if (!in_signal)" checks, but:
>>
>>   - we don't have to duplicate any of the actual application logic
>>
>>   - we'd now cover the extra cases for waitpid failing or returning the
>>     wrong pid (previously if waitpid() failed we'd still look at status,
>>     which could contain complete garbage!)
> ...
> Yeah, the repeated "if (!in_signal)" look a bit ugly, but fixing
> that "we only deal with ifexited in in_signal case" to do the right
> thing would make the code even more annoying and harder to maintain.

Eh, what I meant was that it would be annoying and harder to
maintain, if done inside the single "if (in_signal) { ... }" near
the beginning. (IOW, I am completely in agreement with your
reasoning above).
Jeff King Nov. 22, 2021, 7:11 p.m. UTC | #7
On Sun, Nov 21, 2021 at 09:49:32PM -0800, Junio C Hamano wrote:

> > That's a lot more tedious "if (!in_signal)" checks, but:
> >
> >   - we don't have to duplicate any of the actual application logic
> >
> >   - we'd now cover the extra cases for waitpid failing or returning the
> >     wrong pid (previously if waitpid() failed we'd still look at status,
> >     which could contain complete garbage!)
> 
> Yeah, the repeated "if (!in_signal)" look a bit ugly, but fixing
> that "we only deal with ifexited in in_signal case" to do the right
> thing would make the code even more annoying and harder to maintain.

OK. Let me see if I can clean this up into a full series that actually
fixes the race you saw, and breaks down the other fixes a bit more.

-Peff
diff mbox series

Patch

diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh
index 0e7cf75435..577e51b47a 100755
--- a/t/t7006-pager.sh
+++ b/t/t7006-pager.sh
@@ -661,6 +661,13 @@  test_expect_success 'setup trace2' '
 	export GIT_TRACE2_BRIEF
 '
 
+test_expect_success 'setup large log output' '
+	perl -e "
+		print \"this is a long commit message\" x 50000
+	" >commit-msg &&
+	git commit --allow-empty -F commit-msg
+'
+
 test_expect_success TTY 'git returns SIGPIPE on early pager exit' '
 	test_when_finished "rm pager-used trace.normal" &&
 	test_config core.pager ">pager-used; head -n 1; exit 0" &&
@@ -670,7 +677,7 @@  test_expect_success TTY 'git returns SIGPIPE on early pager exit' '
 
 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
+		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
 		test_match_signal 13 "$OUT"
 	else
 		test_terminal git log
@@ -691,7 +698,7 @@  test_expect_success TTY 'git returns SIGPIPE on early pager non-zero exit' '
 
 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
+		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
 		test_match_signal 13 "$OUT"
 	else
 		test_terminal git log
@@ -712,7 +719,7 @@  test_expect_success TTY 'git discards pager non-zero exit without SIGPIPE' '
 
 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
+		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
 		test "$OUT" -eq 0
 	else
 		test_terminal git log
@@ -724,28 +731,7 @@  test_expect_success TTY 'git discards pager non-zero exit without SIGPIPE' '
 	test_path_is_file pager-used
 '
 
-test_expect_success TTY 'git discards nonexisting pager without SIGPIPE' '
-	test_when_finished "rm pager-used trace.normal" &&
-	test_config core.pager "wc >pager-used; does-not-exist" &&
-	GIT_TRACE2="$(pwd)/trace.normal" &&
-	export GIT_TRACE2 &&
-	test_when_finished "unset GIT_TRACE2" &&
-
-	if test_have_prereq !MINGW
-	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
-		test "$OUT" -eq 0
-	else
-		test_terminal git log
-	fi &&
-
-	grep child_exit trace.normal >child-exits &&
-	test_line_count = 1 child-exits &&
-	grep " code:127 " child-exits &&
-	test_path_is_file pager-used
-'
-
-test_expect_success TTY 'git attempts to page to nonexisting pager command, gets SIGPIPE' '
+test_expect_success TTY 'git skips paging nonexisting command' '
 	test_when_finished "rm trace.normal" &&
 	test_config core.pager "does-not-exist" &&
 	GIT_TRACE2="$(pwd)/trace.normal" &&
@@ -754,8 +740,8 @@  test_expect_success TTY 'git attempts to page to nonexisting pager command, gets
 
 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
-		test_match_signal 13 "$OUT"
+		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
+		test "$OUT" -eq 0
 	else
 		test_terminal git log
 	fi &&
@@ -774,7 +760,7 @@  test_expect_success TTY 'git returns SIGPIPE on propagated signals from pager' '
 
 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
+		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
 		test_match_signal 13 "$OUT"
 	else
 		test_terminal git log