mbox series

[RFC/PATCH,0/5] Fix fetch regression with transport helpers

Message ID 20190604021330.16130-1-felipe.contreras@gmail.com (mailing list archive)
Headers show
Series Fix fetch regression with transport helpers | expand

Message

Felipe Contreras June 4, 2019, 2:13 a.m. UTC
I noticed a regression with running tests for git-remote-hg; can't seem to be
able to fetch tags.

Probably all remote helpers that use the import method are affected, if not
all.

The following patches are meant to test for the issue, fix it, and get some
cleanups.

I'm not exactly sure the solution is the one we want, but hopefull it gives an
idea as to what is needed.


Felipe Contreras (5):
  t5801 (remote-helpers): cleanup refspec stuff
  t5801 (remote-helpers): add test to fetch tags
  fetch: trivial cleanup
  fetch: make the code more understandable
  fetch: fix regression with transport helpers

 builtin/fetch.c            | 28 ++++++++++++++++++----------
 t/t5801-remote-helpers.sh  | 18 ++++++++++++++----
 t/t5801/git-remote-testgit | 22 +++++++++++++---------
 3 files changed, 45 insertions(+), 23 deletions(-)

Comments

Jeff King June 4, 2019, 2:35 p.m. UTC | #1
On Mon, Jun 03, 2019 at 09:13:25PM -0500, Felipe Contreras wrote:

> I noticed a regression with running tests for git-remote-hg; can't seem to be
> able to fetch tags.
> 
> Probably all remote helpers that use the import method are affected, if not
> all.
> 
> The following patches are meant to test for the issue, fix it, and get some
> cleanups.
> 
> I'm not exactly sure the solution is the one we want, but hopefull it gives an
> idea as to what is needed.

It looks good to me. Thanks for fixing this.

The breakage is in v2.20, so I don't think this needs to be rushed into
v2.22 (which is already at -rc3). But it should probably go to 'maint'
sooner rather than later.

-Peff
Felipe Contreras June 4, 2019, 7:15 p.m. UTC | #2
On Tue, Jun 4, 2019 at 9:35 AM Jeff King <peff@peff.net> wrote:
>
> On Mon, Jun 03, 2019 at 09:13:25PM -0500, Felipe Contreras wrote:

> > I'm not exactly sure the solution is the one we want, but hopefull it gives an
> > idea as to what is needed.
>
> It looks good to me. Thanks for fixing this.
>
> The breakage is in v2.20, so I don't think this needs to be rushed into
> v2.22 (which is already at -rc3). But it should probably go to 'maint'
> sooner rather than later.

Indeed, it's probably not so urgent since people have not even been
complaining (that I know of). However, I think it's clear that there's
a regression and the fix is simple, so maybe just apply the obvious
fix, and leave the rest of the patches for later?

I'm attaching only the fix, just to temp the idea of applying that
tiny thing into v2.2.
Johannes Schindelin June 5, 2019, 8:12 a.m. UTC | #3
On Mon, 3 Jun 2019, Felipe Contreras wrote:

> Felipe Contreras (5):
>   t5801 (remote-helpers): cleanup refspec stuff
>   t5801 (remote-helpers): add test to fetch tags
>   fetch: trivial cleanup
>   fetch: make the code more understandable
>   fetch: fix regression with transport helpers
>
>  builtin/fetch.c            | 28 ++++++++++++++++++----------
>  t/t5801-remote-helpers.sh  | 18 ++++++++++++++----
>  t/t5801/git-remote-testgit | 22 +++++++++++++---------
>  3 files changed, 45 insertions(+), 23 deletions(-)

This fails on macOS, in t5601, both in our osx-clang and osx-gcc jobs, as
well as in the StaticAnalysis job. For details, see
https://dev.azure.com/gitgitgadget/git/_build/results?buildId=10206

The t5601 failure looks like this:

-- snip --
checking prerequisite: CASE_INSENSITIVE_FS

mkdir -p "$TRASH_DIRECTORY/prereq-test-dir" &&
(
	cd "$TRASH_DIRECTORY/prereq-test-dir" &&
	echo good >CamelCase &&
	echo bad >camelcase &&
	test "$(cat CamelCase)" != good

)
++ mkdir -p '/Users/vsts/agent/2.150.3/work/1/s/t/trash
directory.t5601-clone/prereq-test-dir'
++ cd '/Users/vsts/agent/2.150.3/work/1/s/t/trash
directory.t5601-clone/prereq-test-dir'
++ echo good
++ echo bad
+++ cat CamelCase
++ test bad '!=' good
prerequisite CASE_INSENSITIVE_FS ok
expecting success:
	grep X icasefs/warning &&
	grep x icasefs/warning &&
	test_i18ngrep "the following paths have collided" icasefs/warning

++ grep X icasefs/warning
error: last command exited with $?=1
not ok 99 - colliding file detection
-- snap --

My guess is that your changes remove something that was expected before,
and is still expected, and that this was only tested on Linux, and only on
a file system with case-sensitive file names.

The StaticAnalysis job (which is admittedly a misnomer) points out another
few valid issues, but that is probably because Junio applied this patch
series on top of a very old commit. I, at least, could not spot any file
in the Coccinelle report that was touched by this here patch series.

Ciao,
Johannes
Jeff King June 5, 2019, 11:27 a.m. UTC | #4
On Wed, Jun 05, 2019 at 10:12:12AM +0200, Johannes Schindelin wrote:

> This fails on macOS, in t5601, both in our osx-clang and osx-gcc jobs, as
> well as in the StaticAnalysis job. For details, see
> https://dev.azure.com/gitgitgadget/git/_build/results?buildId=10206

Hmm. I'm having a hard time seeing why (and I can't seem to reproduce it
locally on a case-insensitive HFS+ filesystem under Linux).

In particular, if the problem is here:

> expecting success:
> 	grep X icasefs/warning &&
> 	grep x icasefs/warning &&
> 	test_i18ngrep "the following paths have collided" icasefs/warning
> 
> ++ grep X icasefs/warning
> error: last command exited with $?=1
> not ok 99 - colliding file detection

then that implies it has to do with the checkout phase, which Felipe's
patch doesn't touch. Later in the log we see the actual file contents
(I'm confused as to how this gets here; it looks like debugging bits
that were added after the main script?):

  2019-06-05T07:58:37.7961890Z Cloning into 'bogus'...
  2019-06-05T07:58:37.7962430Z done.
  2019-06-05T07:58:37.7963360Z warning: the following paths have collided (e.g. case-sensitive paths
  2019-06-05T07:58:37.7964300Z on a case-insensitive filesystem) and only one from the same
  2019-06-05T07:58:37.7964880Z colliding group is in the working tree:
  2019-06-05T07:58:37.7965290Z 
  2019-06-05T07:58:37.7966250Z   'x'

whereas a succeeding test expects us to mention both 'x' and 'X'.

So we _did_ find the collision, but somehow 'X' was not reported.
Looking at the code, I'm not even sure how that could happen. Given that
this process does involve looking at stat data, it makes me wonder if
there could be some raciness involved. But again, I'm scratching my head
as to how exactly, and I couldn't reproduce it under load or with some
carefully inserted sleep() calls.

And it looks like it did reproduce twice on Azure.

Can somebody who has osx locally reproduce this? Or is there a way to
interactively access the Azure environment to dig further?

> My guess is that your changes remove something that was expected before,
> and is still expected, and that this was only tested on Linux, and only on
> a file system with case-sensitive file names.

It sounds like you're suggesting that changes to the test script subtly
affected the later state. Which is indeed a common culprit. But the
changes in Felipe's series were all to t5801, not the failing t5601. Am
I misunderstanding what you mean?

-Peff
Duy Nguyen June 5, 2019, 12:22 p.m. UTC | #5
On Wed, Jun 5, 2019 at 6:27 PM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jun 05, 2019 at 10:12:12AM +0200, Johannes Schindelin wrote:
>
> > This fails on macOS, in t5601, both in our osx-clang and osx-gcc jobs, as
> > well as in the StaticAnalysis job. For details, see
> > https://dev.azure.com/gitgitgadget/git/_build/results?buildId=10206
>
> Hmm. I'm having a hard time seeing why (and I can't seem to reproduce it
> locally on a case-insensitive HFS+ filesystem under Linux).
>
> In particular, if the problem is here:
>
> > expecting success:
> >       grep X icasefs/warning &&
> >       grep x icasefs/warning &&
> >       test_i18ngrep "the following paths have collided" icasefs/warning
> >
> > ++ grep X icasefs/warning
> > error: last command exited with $?=1
> > not ok 99 - colliding file detection
>
> then that implies it has to do with the checkout phase, which Felipe's
> patch doesn't touch. Later in the log we see the actual file contents
> (I'm confused as to how this gets here; it looks like debugging bits
> that were added after the main script?):
>
>   2019-06-05T07:58:37.7961890Z Cloning into 'bogus'...
>   2019-06-05T07:58:37.7962430Z done.
>   2019-06-05T07:58:37.7963360Z warning: the following paths have collided (e.g. case-sensitive paths
>   2019-06-05T07:58:37.7964300Z on a case-insensitive filesystem) and only one from the same
>   2019-06-05T07:58:37.7964880Z colliding group is in the working tree:
>   2019-06-05T07:58:37.7965290Z
>   2019-06-05T07:58:37.7966250Z   'x'
>
> whereas a succeeding test expects us to mention both 'x' and 'X'.
>
> So we _did_ find the collision, but somehow 'X' was not reported.
> Looking at the code, I'm not even sure how that could happen. Given that
> this process does involve looking at stat data, it makes me wonder if

It does use stat data in mark_colliding_entries() if core.checkStat is
false. I think on MacOS it's actually true.

I vaguely recall seeing just one 'x' once. I think last time I had a
problem with truncating st_ino, but that should be fixed in e66ceca94b
(clone: fix colliding file detection on APFS, 2018-11-20). So no idea
how this happens again.

> there could be some raciness involved. But again, I'm scratching my head
> as to how exactly, and I couldn't reproduce it under load or with some
> carefully inserted sleep() calls.
Johannes Schindelin June 6, 2019, 1:07 p.m. UTC | #6
Hi Duy,

On Wed, 5 Jun 2019, Duy Nguyen wrote:

> On Wed, Jun 5, 2019 at 6:27 PM Jeff King <peff@peff.net> wrote:
> >
> > On Wed, Jun 05, 2019 at 10:12:12AM +0200, Johannes Schindelin wrote:
> >
> > > This fails on macOS, in t5601, both in our osx-clang and osx-gcc jobs, as
> > > well as in the StaticAnalysis job. For details, see
> > > https://dev.azure.com/gitgitgadget/git/_build/results?buildId=10206
> >
> > Hmm. I'm having a hard time seeing why (and I can't seem to reproduce it
> > locally on a case-insensitive HFS+ filesystem under Linux).
> >
> > In particular, if the problem is here:
> >
> > > expecting success:
> > >       grep X icasefs/warning &&
> > >       grep x icasefs/warning &&
> > >       test_i18ngrep "the following paths have collided" icasefs/warning
> > >
> > > ++ grep X icasefs/warning
> > > error: last command exited with $?=1
> > > not ok 99 - colliding file detection
> >
> > then that implies it has to do with the checkout phase, which Felipe's
> > patch doesn't touch. Later in the log we see the actual file contents
> > (I'm confused as to how this gets here; it looks like debugging bits
> > that were added after the main script?):
> >
> >   2019-06-05T07:58:37.7961890Z Cloning into 'bogus'...
> >   2019-06-05T07:58:37.7962430Z done.
> >   2019-06-05T07:58:37.7963360Z warning: the following paths have collided (e.g. case-sensitive paths
> >   2019-06-05T07:58:37.7964300Z on a case-insensitive filesystem) and only one from the same
> >   2019-06-05T07:58:37.7964880Z colliding group is in the working tree:
> >   2019-06-05T07:58:37.7965290Z
> >   2019-06-05T07:58:37.7966250Z   'x'
> >
> > whereas a succeeding test expects us to mention both 'x' and 'X'.
> >
> > So we _did_ find the collision, but somehow 'X' was not reported.
> > Looking at the code, I'm not even sure how that could happen. Given that
> > this process does involve looking at stat data, it makes me wonder if
>
> It does use stat data in mark_colliding_entries() if core.checkStat is
> false. I think on MacOS it's actually true.
>
> I vaguely recall seeing just one 'x' once. I think last time I had a
> problem with truncating st_ino, but that should be fixed in e66ceca94b
> (clone: fix colliding file detection on APFS, 2018-11-20). So no idea
> how this happens again.

Good catch. I think the reason it happens again is simply that Junio
picked a base commit that is older than the commit you referenced.

Point in favor: Junio merged these here patches into `pu` and those
test failures (as well as the StaticAnalysis issues) are gone.

Thanks,
Johannes

>
> > there could be some raciness involved. But again, I'm scratching my head
> > as to how exactly, and I couldn't reproduce it under load or with some
> > carefully inserted sleep() calls.
> --
> Duy
>
Junio C Hamano June 6, 2019, 4:13 p.m. UTC | #7
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> I vaguely recall seeing just one 'x' once. I think last time I had a
>> problem with truncating st_ino, but that should be fixed in e66ceca94b
>> (clone: fix colliding file detection on APFS, 2018-11-20). So no idea
>> how this happens again.
>
> Good catch. I think the reason it happens again is simply that Junio
> picked a base commit that is older than the commit you referenced.

Yeah, that is because the patch specifically targetted a single
commit as culprit, so naturally the topic that introduced that
commit was the place to be "fixed" ;-)

I was wondering if the base commit _before_ the fixes, i.e. e198b3a7
("fetch: replace string-list used as a look-up table with a
hashmap", 2018-09-25), failed the same test you saw problems with.
It does predate e66ceca9 ("clone: fix colliding file detection on
APFS", 2018-11-20), so my current theory is that it was broken
already adn these patches that fixed a breakge had nothing to do
with the t5601 tests failing.
Johannes Schindelin June 7, 2019, 5:32 a.m. UTC | #8
Hi Junio,

On Thu, 6 Jun 2019, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> I vaguely recall seeing just one 'x' once. I think last time I had a
> >> problem with truncating st_ino, but that should be fixed in e66ceca94b
> >> (clone: fix colliding file detection on APFS, 2018-11-20). So no idea
> >> how this happens again.
> >
> > Good catch. I think the reason it happens again is simply that Junio
> > picked a base commit that is older than the commit you referenced.
>
> Yeah, that is because the patch specifically targetted a single
> commit as culprit, so naturally the topic that introduced that
> commit was the place to be "fixed" ;-)

Yes, that matches my impression.

I should maybe describe a bit better what *is* tested on GitGitGadget when
it runs the build & test suite for the individual branches (as opposed to
the integration branches maint, master, next & pu): the Azure Pipeline
obviously *cannot* be defined in `azure-pipelines.yml`, as many of those
branches do not even have that file.

One of the features I like most about Azure Pipelines (yes, I really like
Pipelines, they save me from so much work by enabling me to automate a
*lot* in Git for Windows' maintenance, such as building and packaging
quite a few of the components such as OpenSSH, cURL, etc... but I digress)
is that it offers *both* to define the builds via a file that is committed
*and also* in a definition that is maintained outside of the source code.

So what I did was to port azure-pipelines.yml from the former to the
latter, and *that* is run on those individual branches.

As we noticed here, this opens the door for running into regressions that
have been long fixed, just not in the tested branch.

Side note: many projects that want to rely on the confidence instilled by
automated testing therefore change their workflow to a more "topic
branches are based on master, or on the release branches' tips" one. I am
not saying that you, Junio, should switch to such a workflow because you
are clearly comfortable with the current one. I mention this mainly for
the benefit of readers who might wonder what options they have in their
own projects to deal with this.

What I usually do is a hack: this "manual" job definition tries to
cherry-pick all kinds of known fixes to known regressions, and this
APFS-name-collision one is just not one of them. When I find the time
(hopefully next week, probably not), I shall try to take care of that.

> I was wondering if the base commit _before_ the fixes, i.e. e198b3a7
> ("fetch: replace string-list used as a look-up table with a
> hashmap", 2018-09-25), failed the same test you saw problems with.
> It does predate e66ceca9 ("clone: fix colliding file detection on
> APFS", 2018-11-20), so my current theory is that it was broken
> already adn these patches that fixed a breakge had nothing to do
> with the t5601 tests failing.

Yes, I agree.

One might ask why I even bother with testing the individual branches. The
reason is that bisecting breakages in `pu` is a *nightmare*. Most of those
breakages are present already in the individual branches, though, so if I
can catch the breakages there, I have a much easier time keeping the CI
builds green.

Ciao,
Dscho