Message ID | 20181124070428.18571-1-max@max630.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | t5562: do not reuse output files | expand |
Max Kirillov <max@max630.net> writes: > diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh > index 90d890d02f..bb53f82c0c 100755 > --- a/t/t5562-http-backend-content-length.sh > +++ b/t/t5562-http-backend-content-length.sh > @@ -25,6 +25,8 @@ test_http_env() { > handler_type="$1" > request_body="$2" > shift > + (rm -f act.out || true) && > + (rm -f act.err || true) && Why "||true"? If the named file doesn't exist, "rm -f" would succeed, and if it does exist but somehow we fail to remove, then these added lines are not preveting the next part from reusing, i.e. they are not doing what they are supposed to be doing, so we should detect such a failure (if happens) as an error, no? IOW, shouldn't it just be more like + rm -f act.out act.err && The same comment applies to the other hunk. > env \ > CONTENT_TYPE="application/x-git-$handler_type-pack-request" \ > QUERY_STRING="/repo.git/git-$handler_type-pack" \ > @@ -155,6 +157,8 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' ' > ' > > test_expect_success 'empty CONTENT_LENGTH' ' > + (rm -f act.out || true) && > + (rm -f act.err || true) && > env \ > QUERY_STRING="service=git-receive-pack" \ > PATH_TRANSLATED="$PWD"/.git/info/refs \
Junio C Hamano <gitster@pobox.com> writes: > Max Kirillov <max@max630.net> writes: > >> diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh >> index 90d890d02f..bb53f82c0c 100755 >> --- a/t/t5562-http-backend-content-length.sh >> +++ b/t/t5562-http-backend-content-length.sh >> @@ -25,6 +25,8 @@ test_http_env() { >> handler_type="$1" >> request_body="$2" >> shift >> + (rm -f act.out || true) && >> + (rm -f act.err || true) && > > Why "||true"? If the named file doesn't exist, "rm -f" would > succeed, and if it does exist but somehow we fail to remove, then > these added lines are not preveting the next part from reusing, > i.e. they are not doing what they are supposed to be doing,... Another thing. The analysis in your log message talks about a stray process holding open filehandles to these files. An attempt to remove them in such a stat would fail on some systems, so "||true" would not help, would it? It just hides the failure to remove, and when ||true is useful in hiding th failure _is_ when such a stray process is still there, waiting to corrupt the output of the next request and breaking the test, no? I do agree that forcing the parent to wait, like you described in the comment, would be far more preferrable, but until that happens, a better workaround might be to write into unique output filenames (act1.out, act2.out, etc.); that way, you do not have to worry about the output file for the next request getting clobbered by a stale process handling the previous request. But at the same time, wouldn't this suggest that the test or the previous request may see an incomplete output, as the analysed problem is that such a process is writing to the output file very late, while we are preparing to test the enxt request, which meas we have already checked the output file for the previous request, right? So even without ||true, I am not sure how true a "solution" this change is to the issue.
On Sat, Nov 24, 2018 at 04:47:19PM +0900, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > a better workaround might be to write into unique output filenames > (act1.out, act2.out, etc.); that way, you do not have to worry about > the output file for the next request getting clobbered by a stale > process handling the previous request. Yes I agree > But at the same time, > wouldn't this suggest that the test or the previous request may see > an incomplete output, Yes it may miss the child's message. But in this case of failed http-backend process, there should be already one "fatal:" message in the act.err from the parent, and missing another one does not change the outcome.
On Sat, Nov 24, 2018 at 04:47:19PM +0900, Junio C Hamano wrote: > I do agree that forcing the parent to wait, like you described in > the comment, would be far more preferrable, [...] Stray processes can sometimes have funny effects on an outer test harness, too. E.g., I think I've seen hangs running t5562 under prove, because some process is holding open a pipe descriptor. This would probably fix that, too. > but until that happens,[...] But if we can't do that immediately for some reason, I do agree with everything else you said here. ;) -Peff
On Sat, Nov 24, 2018 at 04:47:19PM +0900, Junio C Hamano wrote: > I do agree that forcing the parent to wait, like you described in > the comment, would be far more preferrable, It looks like it can be done as simple as: --- a/http-backend.c +++ b/http-backend.c @@ -486,6 +486,8 @@ static void run_service(const char **argv, int buffer_input) if (buffer_input || gzipped_request || req_len >= 0) cld.in = -1; cld.git_cmd = 1; + cld.clean_on_exit = 1; + cld.wait_after_clean = 1; if (start_command(&cld)) exit(1); at least according to strate it does what it should.
Max Kirillov <max@max630.net> writes: > On Sat, Nov 24, 2018 at 04:47:19PM +0900, Junio C Hamano wrote: >> I do agree that forcing the parent to wait, like you described in >> the comment, would be far more preferrable, > > It looks like it can be done as simple as: > > --- a/http-backend.c > +++ b/http-backend.c > @@ -486,6 +486,8 @@ static void run_service(const char **argv, int buffer_input) > if (buffer_input || gzipped_request || req_len >= 0) > cld.in = -1; > cld.git_cmd = 1; > + cld.clean_on_exit = 1; > + cld.wait_after_clean = 1; > if (start_command(&cld)) > exit(1); > > at least according to strate it does what it should. Sounds sane. I am offhand not sure what the right value of wait_after_clean for this codepath be, though. 46df6906 ("execv_dashed_external: wait for child on signal death", 2017-01-06) made this non-default but turned it on for dashed externals (especially to help the case where they spawn a pager), as the parent has nothing other than to wait for the child to exit in that codepath. Does the same reasoning apply here, too? This is a meta point, but I wonder if there is an easy way to "grep" for uses of run-command interface that do *not* set clean_on_exit. The pager that can outlive us long after we exit, kept alive while the user views our output, is an example cited by afe19ff7 ("run-command: optionally kill children on exit", 2012-01-07), and while I am wondering if the default should hae been to kill the children instead, such a "grep" would have been very useful to know what codepaths would be affected if we flipped the default.
On Mon, Nov 26, 2018 at 11:06:40AM +0900, Junio C Hamano wrote: > I am offhand not sure what the right value of wait_after_clean for > this codepath be, though. 46df6906 ("execv_dashed_external: wait > for child on signal death", 2017-01-06) made this non-default but > turned it on for dashed externals (especially to help the case where > they spawn a pager), as the parent has nothing other than to wait > for the child to exit in that codepath. Does the same reasoning > apply here, too? As far as I understand, the reason to _not_ wait was that the child might still be expecting closed stdin even after receiving SIGTERM, so parent would wait forever. But this is not clearly the case here. And otherwise there should be no reason to not wait, as long as we are interested in synchronous exiting of the child. In my Linux experiments the child was exiting because of signal earlier than because of closed stdin, but who knows, maybe with some bad luck the signal would be delivered after the closed stdin, and we would still have the issue. After all, at Linux it was mostly working even without fix.
diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh index 90d890d02f..bb53f82c0c 100755 --- a/t/t5562-http-backend-content-length.sh +++ b/t/t5562-http-backend-content-length.sh @@ -25,6 +25,8 @@ test_http_env() { handler_type="$1" request_body="$2" shift + (rm -f act.out || true) && + (rm -f act.err || true) && env \ CONTENT_TYPE="application/x-git-$handler_type-pack-request" \ QUERY_STRING="/repo.git/git-$handler_type-pack" \ @@ -155,6 +157,8 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' ' ' test_expect_success 'empty CONTENT_LENGTH' ' + (rm -f act.out || true) && + (rm -f act.err || true) && env \ QUERY_STRING="service=git-receive-pack" \ PATH_TRANSLATED="$PWD"/.git/info/refs \
Some expected failures of git-http-backend leave running its children (receive-pack or upload-pack) which still hold opened descriptors to act.err and with some probability they live long enough to write their failure messages after next test has already truncated the files. This causes occasional failures of the test script. Avoid the issue by unlinking the older files before writing to them. Reported-by: Carlo Arenas <carenas@gmail.com> Helped-by: Carlo Arenas <carenas@gmail.com> Signed-off-by: Max Kirillov <max@max630.net> --- Thanks for the analysis. I seem to have guessed the reason. This patch should prevent it. I think the tests should somehow make sure there are no such late processes. I can see 2 options: * somehow find out in the tests all children and wait for them. I have no idea how. * make http-backend close handle to its child and wait for it to exit before dying. This would not prevent childrenc in general, because http-backend may be killed, but not in our expected failure cases Actually, don't the children receive some SIGHUP? Maybe thy should. However, it would still take some time for them to handle it, so it does not fully solve the issue t/t5562-http-backend-content-length.sh | 4 ++++ 1 file changed, 4 insertions(+)