tests: prefer host Git to verify chainlint self-checks

Message ID	4112adbe467c14a8f22a87ea41aa4705f8760cf6.1702380646.git.ps@pks.im (mailing list archive)
State	New, archived
Headers	show Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="a6ayMmeQ"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="gBWkG4lc" Feedback-ID: i197146af:Fastmail Date: Tue, 12 Dec 2023 12:32:50 +0100 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Cc: Eric Sunshine <sunshine@sunshineco.com> Subject: [PATCH] tests: prefer host Git to verify chainlint self-checks Message-ID: <4112adbe467c14a8f22a87ea41aa4705f8760cf6.1702380646.git.ps@pks.im> Precedence: bulk MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="IY/H6ALst8T78ZDF" Content-Disposition: inline
Series	tests: prefer host Git to verify chainlint self-checks \| expand tests: prefer host Git to verify chainlint self-checks

Patrick Steinhardt Dec. 12, 2023, 11:32 a.m. UTC

The "check-chainlint" target runs automatically when running tests and
performs self-checks to verify that the chainlinter itself produces the
expected output. Originally, the chainlinter was implemented via sed,
but the infrastructure has been rewritten in fb41727b7e (t: retire
unused chainlint.sed, 2022-09-01) to use a Perl script instead.

The rewrite caused some slight whitespace changes in the output that are
ultimately not of much importance. In order to be able to assert that
the actual chainlinter errors match our expectations we thus have to
ignore whitespace characters when diffing them. As the `-w` flag is not
in POSIX we try to use `git diff -w --no-index` before we fall back to
`diff -w -u`.

To accomodate for cases where the host system has no Git installation we
use the locally-compiled version of Git. This can result in problems
though when the Git project's repository is using extensions that the
locally-compiled version of Git doesn't understand. It will refuse to
run and thus cause the checks to fail.

Fix this issue by prefering the host's Git resolved via PATH. If it
doesn't exist, then we fall back to the locally-compiled Git version and

Junio C Hamano Dec. 12, 2023, 7:30 p.m. UTC | #1

Patrick Steinhardt <ps@pks.im> writes:

> To accomodate for cases where the host system has no Git installation we
> use the locally-compiled version of Git. This can result in problems
> though when the Git project's repository is using extensions that the
> locally-compiled version of Git doesn't understand. It will refuse to
> run and thus cause the checks to fail.
>
> Fix this issue by prefering the host's Git resolved via PATH. If it
> doesn't exist, then we fall back to the locally-compiled Git version and
> diff as before.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>
> I've started to dogfood the reftable backend on my local machine and
> have converted many repositories to use the reftable backend. This
> surfaced the described issue because the repository now sets up the
> "extensions.refStorage" extension, and thus "check-chainlint" fails
> depending on which versions of Git I'm trying to compile and test.

I do not think "prefer host Git" is necessarily a good idea; falling
back to use host Git is perfectly fine, of course.

Other than that, I agree with the motivation.

>  t/Makefile | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/t/Makefile b/t/Makefile
> index 225aaf78ed..8b7f7aceaa 100644
> --- a/t/Makefile
> +++ b/t/Makefile
> @@ -111,7 +111,9 @@ check-chainlint:
>  	if test -f ../GIT-BUILD-OPTIONS; then \
>  		. ../GIT-BUILD-OPTIONS; \
>  	fi && \
> -	if test -x ../git$$X; then \
> +	if command -v git >/dev/null 2>&1; then \
> +		DIFFW="git --no-pager diff -w --no-index"; \
> +	elif test -x ../git$$X; then \
>  		DIFFW="../git$$X --no-pager diff -w --no-index"; \
>  	else \
>  		DIFFW="diff -w -u"; \

Patrick Steinhardt Dec. 13, 2023, 7:20 a.m. UTC | #2

On Tue, Dec 12, 2023 at 11:30:22AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > To accomodate for cases where the host system has no Git installation we
> > use the locally-compiled version of Git. This can result in problems
> > though when the Git project's repository is using extensions that the
> > locally-compiled version of Git doesn't understand. It will refuse to
> > run and thus cause the checks to fail.
> >
> > Fix this issue by prefering the host's Git resolved via PATH. If it
> > doesn't exist, then we fall back to the locally-compiled Git version and
> > diff as before.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >
> > I've started to dogfood the reftable backend on my local machine and
> > have converted many repositories to use the reftable backend. This
> > surfaced the described issue because the repository now sets up the
> > "extensions.refStorage" extension, and thus "check-chainlint" fails
> > depending on which versions of Git I'm trying to compile and test.
> 
> I do not think "prefer host Git" is necessarily a good idea; falling
> back to use host Git is perfectly fine, of course.

Why is that, though? We already use host Git in other parts of our build
infra, and the options we pass to git-diff(1) have been around for ages:

  - `--no-pager` was introduced in 463a849d00 (Add and document a global
    --no-pager option for git., 2007-08-19).

  - `--no-index` is around since at least fcfa33ec90 (diff: make more
    cases implicit --no-index, 2007-02-25).

  - `-w` is around since 0d21efa51c (Teach diff about -b and -w flags,
    2006-06-14).

So all options have pretty much been there since forever. Which means
that if the host has Git around, and the Git version is at least v1.5.3,
then it also understands all options.

Furthermore, we are accessing the Git repository that the user has set
up with host Git in the first place, so I'd think even conceptually it
is the correct thing to do.

Patrick

Junio C Hamano Dec. 13, 2023, 3:11 p.m. UTC | #3

Patrick Steinhardt <ps@pks.im> writes:

>> I do not think "prefer host Git" is necessarily a good idea; falling
>> back to use host Git is perfectly fine, of course.
>
> Why is that, though?

Mostly because your "differences in features supported by just-built
one and what happens to be on $PATH can cause problems" cuts both
ways, and as a general principle to work around such issues, using
just-built one is a better discipline.  The features the build
infrastructure ("self-checks" being discussed is a part of it) of a
particular version of Git source depends on are more likely to be
found in the binary that will be built from the matching source,
than what happens to be on $PATH that may be from a year-old version.

As you said, you'd need to accomodate need for those who are
initially porting or building Git on a host without an installed
one.  If we were doing a cross build where just-built on would not
be executable on the host, whatever version on $PATH (or in
/usr/bin) may have to be used, but then in such a case you would not
be testing on host anyway.  For these two reasons, it is a given
that one of the choices has to be to use just-built one.  We can
safely give lower precedence to the host tool.

The only one-and-half practical reasons we may want to fall back to
whatever happens to be on $PATH are:

 - just-built one is so broken that even the simple use by the build
   infrastructure (like "self-checks") does not work (but then it
   becomes "if it is so broken, fix it before even thinking about
   running tests", and that is why I count it as half a reason), or

 - we are bisecting down to an ancient version, and just-built one
   from such a version may not understand the current repository.

so I do think it is an excellent workaround to fall back to a
version of Git with unknown vintage that happens to be on $PATH,
than failing and stopping by relying only on just-built one.

> We already use host Git in other parts of our build
> infra, and the options we pass to git-diff(1) have been around for ages:

It only argues for "trying host one, if available, before just-built
one does not hurt for this particular case".  But I was interested
in laying out a more general principle we can follow in similar
situations in the future.

Eric Sunshine Dec. 14, 2023, 3:33 a.m. UTC | #4

On Wed, Dec 13, 2023 at 10:11 AM Junio C Hamano <gitster@pobox.com> wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> >> I do not think "prefer host Git" is necessarily a good idea; falling
> >> back to use host Git is perfectly fine, of course.
> >
> > Why is that, though?
>
> Mostly because your "differences in features supported by just-built
> one and what happens to be on $PATH can cause problems" cuts both
> ways [...]

I sent an alternative solution[1] which should sidestep this objection.

As usual, I forgot to use --in-reply-to=<this-thread> when sending the
patch, even though I reminded myself to use it only a minute or so
earlier. Sorry.

[1]: https://lore.kernel.org/git/20231214032248.1615-1-ericsunshine@charter.net/

Patrick Steinhardt Dec. 14, 2023, 8:13 a.m. UTC | #5

On Wed, Dec 13, 2023 at 10:33:00PM -0500, Eric Sunshine wrote:
> On Wed, Dec 13, 2023 at 10:11 AM Junio C Hamano <gitster@pobox.com> wrote:
> > Patrick Steinhardt <ps@pks.im> writes:
> > >> I do not think "prefer host Git" is necessarily a good idea; falling
> > >> back to use host Git is perfectly fine, of course.
> > >
> > > Why is that, though?
> >
> > Mostly because your "differences in features supported by just-built
> > one and what happens to be on $PATH can cause problems" cuts both
> > ways [...]
> 
> I sent an alternative solution[1] which should sidestep this objection.
> 
> As usual, I forgot to use --in-reply-to=<this-thread> when sending the
> patch, even though I reminded myself to use it only a minute or so
> earlier. Sorry.
> 
> [1]: https://lore.kernel.org/git/20231214032248.1615-1-ericsunshine@charter.net/

Thanks, I've replied to the thread. I think by now there are three
different ideas:

  - Improve the logic to pick some kind of diff implementation, which is
    my patch series. It would need to be improved so that we also probe
    whether the respective Git executables actually understand the repo
    format so that we can fall back from the just-built Git to system's
    Git.

  - Munge the whitespace of the expected results with some regexes.
    I like that idea better because we can avoid the git-diff(1)
    problem, but find that the result is somewhat hard to read.

  - Fix the ".expect" files so that we can avoid all of these games.

I actually like the last option most. I'll have a go at it and send this
third version out in a bit.

Patrick

Eric Sunshine Dec. 14, 2023, 8:39 a.m. UTC | #6

On Thu, Dec 14, 2023 at 3:13 AM Patrick Steinhardt <ps@pks.im> wrote:
> On Wed, Dec 13, 2023 at 10:33:00PM -0500, Eric Sunshine wrote:
> > On Wed, Dec 13, 2023 at 10:11 AM Junio C Hamano <gitster@pobox.com> wrote:
> > > Mostly because your "differences in features supported by just-built
> > > one and what happens to be on $PATH can cause problems" cuts both
> > > ways [...]
> >
> > I sent an alternative solution[1] which should sidestep this objection.
>
> Thanks, I've replied to the thread. I think by now there are three
> different ideas:
>
>   - Improve the logic to pick some kind of diff implementation, which is
>     my patch series. It would need to be improved so that we also probe
>     whether the respective Git executables actually understand the repo
>     format so that we can fall back from the just-built Git to system's
>     Git.
>
>   - Munge the whitespace of the expected results with some regexes.
>     I like that idea better because we can avoid the git-diff(1)
>     problem, but find that the result is somewhat hard to read.
>
>   - Fix the ".expect" files so that we can avoid all of these games.
>
> I actually like the last option most. I'll have a go at it and send this
> third version out in a bit.

I sent a reply[1] in the other thread explaining why I'm still leaning
toward `sed` to smooth over these minor differences rather than
churning the "expect" files, especially since the minor differences
are not significant to what is actually being tested. That said, I
won't stand in the way of such a patch to "fix" the "expect" files,
but it feels unnecessary.

[1]: https://lore.kernel.org/git/CAPig+cSkuRfkR2D3JqYcbaJqj485nfD9Nq6pM=vXWB5DJenWpA@mail.gmail.com/

Patrick Steinhardt Dec. 14, 2023, 8:40 a.m. UTC | #7

On Thu, Dec 14, 2023 at 03:39:17AM -0500, Eric Sunshine wrote:
> On Thu, Dec 14, 2023 at 3:13 AM Patrick Steinhardt <ps@pks.im> wrote:
> > On Wed, Dec 13, 2023 at 10:33:00PM -0500, Eric Sunshine wrote:
> > > On Wed, Dec 13, 2023 at 10:11 AM Junio C Hamano <gitster@pobox.com> wrote:
> > > > Mostly because your "differences in features supported by just-built
> > > > one and what happens to be on $PATH can cause problems" cuts both
> > > > ways [...]
> > >
> > > I sent an alternative solution[1] which should sidestep this objection.
> >
> > Thanks, I've replied to the thread. I think by now there are three
> > different ideas:
> >
> >   - Improve the logic to pick some kind of diff implementation, which is
> >     my patch series. It would need to be improved so that we also probe
> >     whether the respective Git executables actually understand the repo
> >     format so that we can fall back from the just-built Git to system's
> >     Git.
> >
> >   - Munge the whitespace of the expected results with some regexes.
> >     I like that idea better because we can avoid the git-diff(1)
> >     problem, but find that the result is somewhat hard to read.
> >
> >   - Fix the ".expect" files so that we can avoid all of these games.
> >
> > I actually like the last option most. I'll have a go at it and send this
> > third version out in a bit.
> 
> I sent a reply[1] in the other thread explaining why I'm still leaning
> toward `sed` to smooth over these minor differences rather than
> churning the "expect" files, especially since the minor differences
> are not significant to what is actually being tested. That said, I
> won't stand in the way of such a patch to "fix" the "expect" files,
> but it feels unnecessary.
> 
> [1]: https://lore.kernel.org/git/CAPig+cSkuRfkR2D3JqYcbaJqj485nfD9Nq6pM=vXWB5DJenWpA@mail.gmail.com/

Yeah, our mails indeed crossed. I personally do not mind much which of
our patches land upstream and would be happy with either.

Thanks!

Patrick

Junio C Hamano Dec. 14, 2023, 4:16 p.m. UTC | #8

Eric Sunshine <sunshine@sunshineco.com> writes:

>>   - Fix the ".expect" files so that we can avoid all of these games.
>>
>> I actually like the last option most. I'll have a go at it and send this
>> third version out in a bit.
>
> I sent a reply[1] in the other thread explaining why I'm still leaning
> toward `sed` to smooth over these minor differences rather than
> churning the "expect" files, especially since the minor differences
> are not significant to what is actually being tested.

If it is just one time bulk conversion under t/chainlint/ to match
what the chainlint.pl script produces, with the possibility of
similar bulk updates in the future when the script gets updated, I
tend to agree with Patrick that getting rid of the fuzzy comparison
will be the best way forward.

Especially if the fuzzy comparison is done only to hide differences
between what the old chainlint.sed used to produce and what the
current version produces, that is.  If for some reason the script
started to create subtly different output for other reasons (e.g.,
it may produce different whitespaces on a particular platform, or
with a specific version of perl interpreter), we'd better be aware
of it, instead of blindly ignoring the differences without
inspecting them and verifying that they are benign.

By going through the single conversion pain, it will force us to
think twice before breaking its output stability while updating
chainlint.pl, which would also be a good thing.

THanks.

Eric Sunshine Dec. 14, 2023, 6:10 p.m. UTC | #9

On Thu, Dec 14, 2023 at 11:16 AM Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> > I sent a reply[1] in the other thread explaining why I'm still leaning
> > toward `sed` to smooth over these minor differences rather than
> > churning the "expect" files, especially since the minor differences
> > are not significant to what is actually being tested.
>
> If it is just one time bulk conversion under t/chainlint/ to match
> what the chainlint.pl script produces, with the possibility of
> similar bulk updates in the future when the script gets updated, I
> tend to agree with Patrick that getting rid of the fuzzy comparison
> will be the best way forward.

Okay, that's fine. If we take this approach, though, then it would
make sense to eliminate _all_ gratuitous postprocessing of the
"expect" files[1] so that we really are comparing the direct output of
chainlint.pl with the "expect" files, rather than merely munging the
inline whitespace of the "expect" files slightly as Patrick's proposed
patch does[2].

(The only postprocessing of "expect" files which needs to stay is the
bit which removes the "# LINT:" comments which litter the "expect"
files explaining to human readers why the linter should insert a
"???FOO???" annotation at that particular point.)

> Especially if the fuzzy comparison is done only to hide differences
> between what the old chainlint.sed used to produce and what the
> current version produces, that is.  If for some reason the script
> started to create subtly different output for other reasons (e.g.,
> it may produce different whitespaces on a particular platform, or
> with a specific version of perl interpreter), we'd better be aware
> of it, instead of blindly ignoring the differences without
> inspecting them and verifying that they are benign.
>
> By going through the single conversion pain, it will force us to
> think twice before breaking its output stability while updating
> chainlint.pl, which would also be a good thing.

The chainlint self-tests were never meant to be about its general
output stability. They were intended to ensure that the "???FOO???"
annotations are: (1) indeed inserted for the set of linting problems
the tool detects, and (2) inserted at the correct spot in the emitted
output relative to the shell tokens to which the annotation applies.
Minor differences in the tool's output (whether over time or between
platforms) should be immaterial in respect to those correctness goals.

[1]: https://lore.kernel.org/git/CAPig+cTZmiXdPZEVO-F2UzV9YaP6c7r2MfPTC3QWksJa+rM7VA@mail.gmail.com/
[2]: https://lore.kernel.org/git/aec86a15c69aa276eee4875fad208ee2fc57635a.1702542564.git.ps@pks.im/

Junio C Hamano Dec. 14, 2023, 7:13 p.m. UTC | #10

Eric Sunshine <sunshine@sunshineco.com> writes:

> (The only postprocessing of "expect" files which needs to stay is the
> bit which removes the "# LINT:" comments which litter the "expect"
> files explaining to human readers why the linter should insert a
> "???FOO???" annotation at that particular point.)

Yup, that matches my understanding.

> The chainlint self-tests were never meant to be about its general
> output stability. They were intended to ensure that the "???FOO???"
> annotations are: (1) indeed inserted for the set of linting problems
> the tool detects, and (2) inserted at the correct spot in the emitted
> output relative to the shell tokens to which the annotation applies.

Yup.

> Minor differences in the tool's output (whether over time or between
> platforms) should be immaterial in respect to those correctness goals.

But there is no reason to make such immaterial changes to the output
gratuitously when we are updating the tool to improve it, no?

Patrick Steinhardt Dec. 15, 2023, 5:33 a.m. UTC | #11

On Thu, Dec 14, 2023 at 01:10:48PM -0500, Eric Sunshine wrote:
> On Thu, Dec 14, 2023 at 11:16 AM Junio C Hamano <gitster@pobox.com> wrote:
> > Eric Sunshine <sunshine@sunshineco.com> writes:
> > > I sent a reply[1] in the other thread explaining why I'm still leaning
> > > toward `sed` to smooth over these minor differences rather than
> > > churning the "expect" files, especially since the minor differences
> > > are not significant to what is actually being tested.
> >
> > If it is just one time bulk conversion under t/chainlint/ to match
> > what the chainlint.pl script produces, with the possibility of
> > similar bulk updates in the future when the script gets updated, I
> > tend to agree with Patrick that getting rid of the fuzzy comparison
> > will be the best way forward.
> 
> Okay, that's fine. If we take this approach, though, then it would
> make sense to eliminate _all_ gratuitous postprocessing of the
> "expect" files[1] so that we really are comparing the direct output of
> chainlint.pl with the "expect" files, rather than merely munging the
> inline whitespace of the "expect" files slightly as Patrick's proposed
> patch does[2].
> 
> (The only postprocessing of "expect" files which needs to stay is the
> bit which removes the "# LINT:" comments which litter the "expect"
> files explaining to human readers why the linter should insert a
> "???FOO???" annotation at that particular point.)

Okay. I'll send a v3 to also drop the other post-processing steps then.

Patrick

tests: prefer host Git to verify chainlint self-checks

Commit Message

Comments

Patch