diff mbox series

t5570: drop racy test

Message ID 20181220164150.GB25639@hank.intra.tgummerer.com (mailing list archive)
State New, archived
Headers show
Series t5570: drop racy test | expand

Commit Message

Thomas Gummerer Dec. 20, 2018, 4:41 p.m. UTC
On 11/26, Jeff King wrote:
> On Sun, Nov 25, 2018 at 10:01:38PM +0000, Thomas Gummerer wrote:
> 
> > On 11/25, Torsten Bögershausen wrote:
> > > After running the  "Git 2.20-rc1" testsuite here on a raspi,
> > > the only TC that failed was t5570.
> > > When the "grep" was run on daemon.log, the file was empty (?).
> > > When inspecting it later, it was filled, and grep would have found
> > > the "extended.attribute" it was looking for.
> > 
> > I believe this has been reported before in
> > https://public-inbox.org/git/1522783990.964448.1325338528.0D49CC15@webmail.messagingengine.com/,
> > but it seems like the thread never ended with actually fixing it.
> > Reading the first reply Peff seemed to be fine with just removing the
> > test completely, which would be the easiest solution ;)  Adding him to
> > Cc: here.
> 
> Yes, I don't think there is a way to make this race-proof without
> somehow convincing "cat" to flush (and let us know when it has). Which
> really implies killing the daemon, and wait()ing on cat to process the
> EOF and exit.  And that makes the tests a lot more expensive if we have
> to start the daemon for each snippet.
> 
> So I'm still fine with just dropping this test.

Alright since this has come up twice on the mailing list now (and I've
seen this racyness as well), and nobody found the time to write a
patch yet, below is a patch to just remove the test.

> > > diff --git a/t/t5570-git-daemon.sh b/t/t5570-git-daemon.sh
> > > index 7466aad111..e259fee0ed 100755
> > > --- a/t/t5570-git-daemon.sh
> > > +++ b/t/t5570-git-daemon.sh
> > > @@ -192,6 +192,7 @@ test_expect_success 'daemon log records all attributes' '
> > >  	GIT_OVERRIDE_VIRTUAL_HOST=localhost \
> > >  		git -c protocol.version=1 \
> > >  			ls-remote "$GIT_DAEMON_URL/interp.git" &&
> > > +	sleep 1 &&
> > >  	grep -i extended.attribute daemon.log | cut -d" " -f2- >actual &&
> > >  	test_cmp expect actual
> > >  '
> > > ----------------
> > > A slightly better approach may be to use a "sleep on demand":
> > > 
> > > +	( grep -i -q extended.attribute daemon.log || sleep 1 ) &&
> 
> That doesn't really fix it, but just broadens the race window. I dunno.
> Maybe that is enough in practice. We could do something like:
> 
>   repeat_with_timeout () {
> 	local i=0
> 	while test $i -lt 10
> 	do
> 		"$@" && return 0
> 		sleep 1
> 	done
> 	# no success even after 10 seconds
> 	return 1
>   }
> 
>   repeat_with_timeout grep -i extended.attribute daemon.log
> 
> to make the pattern a bit more obvious (and make it easy to extend the
> window arbitrarily; surely 10s is enough?).

I gave this a try, with below patch to lib-git-daemon.sh that you
proposed in the previous thread about this racyness.  That shows
another problem though, namely when truncating 'daemon.log' before
running 'git ls-remote' in this test, we're not sure all 'git deamon'
has flushed everything from previous invocations.  That may be an even
rarer problem in practice, but still something to keep in mind.

Comments

Jeff King Dec. 20, 2018, 5:14 p.m. UTC | #1
On Thu, Dec 20, 2018 at 04:41:50PM +0000, Thomas Gummerer wrote:

> > That doesn't really fix it, but just broadens the race window. I dunno.
> > Maybe that is enough in practice. We could do something like:
> > 
> >   repeat_with_timeout () {
> > 	local i=0
> > 	while test $i -lt 10
> > 	do
> > 		"$@" && return 0
> > 		sleep 1
> > 	done
> > 	# no success even after 10 seconds
> > 	return 1
> >   }
> > 
> >   repeat_with_timeout grep -i extended.attribute daemon.log
> > 
> > to make the pattern a bit more obvious (and make it easy to extend the
> > window arbitrarily; surely 10s is enough?).
> 
> I gave this a try, with below patch to lib-git-daemon.sh that you
> proposed in the previous thread about this racyness.  That shows
> another problem though, namely when truncating 'daemon.log' before
> running 'git ls-remote' in this test, we're not sure all 'git deamon'
> has flushed everything from previous invocations.  That may be an even
> rarer problem in practice, but still something to keep in mind.

Right, that makes sense. Making this race-proof really does require a
separate log stream for each test. I guess we'd need to be able to send
git-daemon a signal to re-open the log (which actually is not as
unreasonable as it may seem; lots of daemons have this for log
rotation).

I think getting rid of the "cat" would also help a lot here.
Unfortunately I think we use it not just for its "tee" effect, but also
to avoid startup races by checking the "Ready to rumble" line. So again,
we'd need some cooperation from git-daemon to tell us out-of-band that
it has completed its startup (e.g., by touching another file).

> Dscho also mentioned on #git-devel a while ago that he may have a look
> at actually making this test race-proof, but I guess he's been busy
> with the 2.20 release.  I'm also not sure it's worth spending a lot of
> time trying to fix this test, but I'd definitely be happy if someone
> proposes a different solution.

Yeah. I'm sure it's fixable with enough effort, but I just think there
are more interesting and important things to work on.

> --- >8 ---
> Subject: [PATCH] t5570: drop racy test

So yeah, I'm still fine with this. But...

> ---
>  t/t5570-git-daemon.sh | 13 -------------
>  1 file changed, 13 deletions(-)

This is the only user of daemon.log, so we could drop those bits from
lib-git-daemon.sh, too. That would also prevent people from adding new
tests, thinking that this was somehow not horribly racy). I.e.,
reverting 314a73d658 (t/lib-git-daemon: record daemon log, 2018-01-25).

-Peff
Johannes Schindelin Dec. 21, 2018, 3:48 p.m. UTC | #2
Hi Thomas & Peff,

On Thu, 20 Dec 2018, Jeff King wrote:

> On Thu, Dec 20, 2018 at 04:41:50PM +0000, Thomas Gummerer wrote:
> 
> > > That doesn't really fix it, but just broadens the race window. I dunno.
> > > Maybe that is enough in practice. We could do something like:
> > > 
> > >   repeat_with_timeout () {
> > > 	local i=0
> > > 	while test $i -lt 10
> > > 	do
> > > 		"$@" && return 0
> > > 		sleep 1
> > > 	done
> > > 	# no success even after 10 seconds
> > > 	return 1
> > >   }
> > > 
> > >   repeat_with_timeout grep -i extended.attribute daemon.log
> > > 
> > > to make the pattern a bit more obvious (and make it easy to extend the
> > > window arbitrarily; surely 10s is enough?).
> > 
> > I gave this a try, with below patch to lib-git-daemon.sh that you
> > proposed in the previous thread about this racyness.  That shows
> > another problem though, namely when truncating 'daemon.log' before
> > running 'git ls-remote' in this test, we're not sure all 'git deamon'
> > has flushed everything from previous invocations.  That may be an even
> > rarer problem in practice, but still something to keep in mind.
> 
> Right, that makes sense. Making this race-proof really does require a
> separate log stream for each test. I guess we'd need to be able to send
> git-daemon a signal to re-open the log (which actually is not as
> unreasonable as it may seem; lots of daemons have this for log
> rotation).
> 
> I think getting rid of the "cat" would also help a lot here.
> Unfortunately I think we use it not just for its "tee" effect, but also
> to avoid startup races by checking the "Ready to rumble" line. So again,
> we'd need some cooperation from git-daemon to tell us out-of-band that
> it has completed its startup (e.g., by touching another file).
> 
> > Dscho also mentioned on #git-devel a while ago that he may have a look
> > at actually making this test race-proof, but I guess he's been busy
> > with the 2.20 release.

And GitGitGadget. And working on the Azure Pipelines support. And
mentoring two interns.

This is what I still have in my internal ticket:

	Try to work around occasional t5570 failures in Git's test suite

	Seems that there is a race condition in
	https://github.com/git/git/blob/master/t/lib-git-daemon.sh#L48-L69
	that could possibly be solved by writing to the daemon.log
	directly, and showing the output only via `tail -f` (and only when
	running in verbose mode, as it simply won't make sense otherwise).

However, if the preferred route is to go ahead and just remove that test
altogether, I'm fine with that, too.

The only reason, in my mind, why we still have `git-daemon` is that it
allows for easy standing up your own Git server, e.g. as an ad-hoc way to
collaborate in a small ad-hoc team. If we ever get to the point where we
can stand up a minimal HTTP/HTTPS server with an internal Git command (not
requiring sysadmin privileges), from my point of view `git-daemon` can
even go the way of the Kale Island (but for much better reasons [*1*]).

> > I'm also not sure it's worth spending a lot of time trying to fix this
> > test, but I'd definitely be happy if someone proposes a different
> > solution.
> 
> Yeah. I'm sure it's fixable with enough effort, but I just think there
> are more interesting and important things to work on.
> 
> > --- >8 ---
> > Subject: [PATCH] t5570: drop racy test
> 
> So yeah, I'm still fine with this. But...
> 
> > ---
> >  t/t5570-git-daemon.sh | 13 -------------
> >  1 file changed, 13 deletions(-)
> 
> This is the only user of daemon.log, so we could drop those bits from
> lib-git-daemon.sh, too. That would also prevent people from adding new
> tests, thinking that this was somehow not horribly racy). I.e.,
> reverting 314a73d658 (t/lib-git-daemon: record daemon log, 2018-01-25).

Indeed, that would be good.

The only reason to keep daemon.log that I can think of is to make
debugging easier, but then, if it should become necessary, it is probably
easier to freopen() stdout or stderr into a file in `git daemon`, anyway.

Ciao,
Dscho

Footnote *1*: Kale Island, along with Rapita, Rehana, Kakatina and Zollies
is prominently featured in a scientific article at
http://iopscience.iop.org/article/10.1088/1748-9326/11/5/054011 that is on
my "important papers I read in 2018" list.
Thomas Gummerer Jan. 6, 2019, 5:59 p.m. UTC | #3
On 12/21, Johannes Schindelin wrote:
> Hi Thomas & Peff,
> 
> On Thu, 20 Dec 2018, Jeff King wrote:
> 
> > On Thu, Dec 20, 2018 at 04:41:50PM +0000, Thomas Gummerer wrote:
> > 
> > > Dscho also mentioned on #git-devel a while ago that he may have a look
> > > at actually making this test race-proof, but I guess he's been busy
> > > with the 2.20 release.
> 
> And GitGitGadget. And working on the Azure Pipelines support. And
> mentoring two interns.
> 
> This is what I still have in my internal ticket:
> 
> 	Try to work around occasional t5570 failures in Git's test suite
> 
> 	Seems that there is a race condition in
> 	https://github.com/git/git/blob/master/t/lib-git-daemon.sh#L48-L69
> 	that could possibly be solved by writing to the daemon.log
> 	directly, and showing the output only via `tail -f` (and only when
> 	running in verbose mode, as it simply won't make sense otherwise).
> 
> However, if the preferred route is to go ahead and just remove that test
> altogether, I'm fine with that, too.

Right, I was of course completely unaware of that internal ticket.  If
you still want to go that way there are certainly no objections from
me.  I just want to make sure not more users run into this racyness,
and I guess you also may have more important/interesting things to
work on.

> The only reason, in my mind, why we still have `git-daemon` is that it
> allows for easy standing up your own Git server, e.g. as an ad-hoc way to
> collaborate in a small ad-hoc team. If we ever get to the point where we
> can stand up a minimal HTTP/HTTPS server with an internal Git command (not
> requiring sysadmin privileges), from my point of view `git-daemon` can
> even go the way of the Kale Island (but for much better reasons [*1*]).
> 
> > > I'm also not sure it's worth spending a lot of time trying to fix this
> > > test, but I'd definitely be happy if someone proposes a different
> > > solution.
> > 
> > Yeah. I'm sure it's fixable with enough effort, but I just think there
> > are more interesting and important things to work on.
> > 
> > > --- >8 ---
> > > Subject: [PATCH] t5570: drop racy test
> > 
> > So yeah, I'm still fine with this. But...
> > 
> > > ---
> > >  t/t5570-git-daemon.sh | 13 -------------
> > >  1 file changed, 13 deletions(-)
> > 
> > This is the only user of daemon.log, so we could drop those bits from
> > lib-git-daemon.sh, too. That would also prevent people from adding new
> > tests, thinking that this was somehow not horribly racy). I.e.,
> > reverting 314a73d658 (t/lib-git-daemon: record daemon log, 2018-01-25).

Right that makes sense.  I sent that as patch 2/1, but I'm happy to
squash those into one if that's preferred.

> Indeed, that would be good.
> 
> The only reason to keep daemon.log that I can think of is to make
> debugging easier, but then, if it should become necessary, it is probably
> easier to freopen() stdout or stderr into a file in `git daemon`, anyway.

We do still print the output when tests are run in verbose mode, which
should be just as good as having the log in a separate file in most
cases I suspect.

> Ciao,
> Dscho
> 
> Footnote *1*: Kale Island, along with Rapita, Rehana, Kakatina and Zollies
> is prominently featured in a scientific article at
> http://iopscience.iop.org/article/10.1088/1748-9326/11/5/054011 that is on
> my "important papers I read in 2018" list.
diff mbox series

Patch

diff --git a/t/lib-git-daemon.sh b/t/lib-git-daemon.sh
index edbea2d986..3c7fea169b 100644
--- a/t/lib-git-daemon.sh
+++ b/t/lib-git-daemon.sh
@@ -62,6 +62,7 @@  start_git_daemon() {
 		(
 			while read -r line <&7
 			do
+				sleep 1
 				printf "%s\n" "$line"
 				printf >&4 "%s\n" "$line"
 			done

Dscho also mentioned on #git-devel a while ago that he may have a look
at actually making this test race-proof, but I guess he's been busy
with the 2.20 release.  I'm also not sure it's worth spending a lot of
time trying to fix this test, but I'd definitely be happy if someone
proposes a different solution.

> -Peff

--- >8 ---
Subject: [PATCH] t5570: drop racy test

t5570 being racy has been reported twice separately on the mailing
list [*1*, *2*].

To make the test race proof, we'd either have to introduce another
fifo the test snippet is waiting on, or somehow convincing "cat" to
flush (and let us know when it has).  Which really implies killing the
daemon, and wait()ing on cat to process the EOF and exit.  And that
makes the tests a lot more expensive if we have to start the daemon
for each snippet.

As this is a test for a relatively minor fix (according to the author)
in 19136be3f8 ("daemon: fix off-by-one in logging extended
attributes", 2018-01-24), drop it to avoid this racyness.  It doesn't
seem worth making the test code much more complex, or slowing down all
tests just to keep this one.

*1*: 1522783990.964448.1325338528.0D49CC15@webmail.messagingengine.com/
*2*: 9d4e5224-9ff4-f3f8-519d-7b2a6f1ea7cd@web.de

Reported-by: Jan Palus <jpalus@fastmail.com>
Reported-by: Torsten Bögershausen <tboegi@web.de>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 t/t5570-git-daemon.sh | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/t/t5570-git-daemon.sh b/t/t5570-git-daemon.sh
index 7466aad111..58ee787685 100755
--- a/t/t5570-git-daemon.sh
+++ b/t/t5570-git-daemon.sh
@@ -183,19 +183,6 @@  test_expect_success 'hostname cannot break out of directory' '
 		git ls-remote "$GIT_DAEMON_URL/escape.git"
 '
 
-test_expect_success 'daemon log records all attributes' '
-	cat >expect <<-\EOF &&
-	Extended attribute "host": localhost
-	Extended attribute "protocol": version=1
-	EOF
-	>daemon.log &&
-	GIT_OVERRIDE_VIRTUAL_HOST=localhost \
-		git -c protocol.version=1 \
-			ls-remote "$GIT_DAEMON_URL/interp.git" &&
-	grep -i extended.attribute daemon.log | cut -d" " -f2- >actual &&
-	test_cmp expect actual
-'
-
 test_expect_success FAKENC 'hostname interpolation works after LF-stripping' '
 	{
 		printf "git-upload-pack /interp.git\n\0host=localhost" | packetize