[2/2] chainlint: reduce annotation noise-factor

Message ID	20240829091625.41297-3-ericsunshine@charter.net (mailing list archive)
State	New
Headers	show Received: from impout006.msg.chrl.nc.charter.net (impout006aa.msg.chrl.nc.charter.net [47.43.20.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08C8D18DF75 for <git@vger.kernel.org>; Thu, 29 Aug 2024 09:18:25 +0000 (UTC) From: Eric Sunshine <ericsunshine@charter.net> To: git@vger.kernel.org Cc: Jeff King <peff@peff.net>, Eric Sunshine <sunshine@sunshineco.com> Subject: [PATCH 2/2] chainlint: reduce annotation noise-factor Date: Thu, 29 Aug 2024 05:16:25 -0400 Message-ID: <20240829091625.41297-3-ericsunshine@charter.net> In-Reply-To: <20240829091625.41297-1-ericsunshine@charter.net> References: <20240829091625.41297-1-ericsunshine@charter.net> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	make chainlint output more newcomer-friendly \| expand [0/2] make chainlint output more newcomer-friendly [1/2] chainlint: make error messages self-explanatory [2/2] chainlint: reduce annotation noise-factor

Eric Sunshine Aug. 29, 2024, 9:16 a.m. UTC

From: Eric Sunshine <sunshine@sunshineco.com>

When chainlint detects a problem in a test definition, it highlights the
offending code with an "?!ERR ...?!" annotation. The rather curious "?!"
delimiter was chosen to draw the reader's attention to the problem area.

Later, chainlint learned to color its output when sent to a terminal.
Problem annotations are colored with a red background which stands out
well from surrounding text, thus easily draws the reader's attention. As
such, the additional "?!" decoration became superfluous (when output is
colored), however the decoration was retained since it serves as a good
needle when using the terminal's search feature to "jump" to the next
problem.

Nevertheless, the "?!" decoration is noisy and ugly and makes it
unnecessarily difficult for the reader to pluck the problem description
from the annotation. For instance, it is easier to see at a glance what
the problem is in:

    ERR missing '&&'

than in the noisier:

    ?!ERR missing '&&'?!

Therefore drop the "!?" decoration when output is colored (but retain it
otherwise).

Note that the preceding change gave all problem annotations a uniform
"ERR" prefix which serves as a reasonably suitable replacement needle
when searching in a terminal, so loss of "?!" in the output should not
be overly problematic.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/chainlint.pl | 7 +++++--
 t/test-lib.sh  | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

Patrick Steinhardt Aug. 29, 2024, 10:03 a.m. UTC | #1

On Thu, Aug 29, 2024 at 05:16:25AM -0400, Eric Sunshine wrote:
> From: Eric Sunshine <sunshine@sunshineco.com>
> 
> When chainlint detects a problem in a test definition, it highlights the
> offending code with an "?!ERR ...?!" annotation. The rather curious "?!"
> delimiter was chosen to draw the reader's attention to the problem area.
> 
> Later, chainlint learned to color its output when sent to a terminal.
> Problem annotations are colored with a red background which stands out
> well from surrounding text, thus easily draws the reader's attention. As
> such, the additional "?!" decoration became superfluous (when output is
> colored), however the decoration was retained since it serves as a good
> needle when using the terminal's search feature to "jump" to the next
> problem.
> 
> Nevertheless, the "?!" decoration is noisy and ugly and makes it
> unnecessarily difficult for the reader to pluck the problem description
> from the annotation. For instance, it is easier to see at a glance what
> the problem is in:
> 
>     ERR missing '&&'
> 
> than in the noisier:
> 
>     ?!ERR missing '&&'?!
> 
> Therefore drop the "!?" decoration when output is colored (but retain it
> otherwise).
> 
> Note that the preceding change gave all problem annotations a uniform
> "ERR" prefix which serves as a reasonably suitable replacement needle
> when searching in a terminal, so loss of "?!" in the output should not
> be overly problematic.

Okay, now the "ERR" prefix becomes a bit more important because we drop
the other punctuation. I'm still not much of a fan of it, though. Makes
me wonder whether we want to take a clue from how compilers nowadays
format this, e.g. by using "pointers".

So this:

    2 (
    3 	foo |
    4 	bar |
    5 	baz &&
    6 
    7 	fish |
    8 	cow ?!AMP?!
    9 
    10 	sunder
    11 )

Would become this:

    t/chainlint/pipe.actual:8: error: expected ampersands (&&)
    7 	fish |
    8 	cow 
            ^
    9

While this would be neat, I guess it would also be way more work than
the current series you have posted. And whether that work is ultimately
really worth it may be another question. Probably not.

So overall, I'm fine with the direction that your patch series takes.

Patrick

Junio C Hamano Aug. 29, 2024, 3:55 p.m. UTC | #2

Eric Sunshine <ericsunshine@charter.net> writes:

> From: Eric Sunshine <sunshine@sunshineco.com>
>
> When chainlint detects a problem in a test definition, it highlights the
> offending code with an "?!ERR ...?!" annotation. The rather curious "?!"
> delimiter was chosen to draw the reader's attention to the problem area.
>
> Later, chainlint learned to color its output when sent to a terminal.
> Problem annotations are colored with a red background which stands out
> well from surrounding text, thus easily draws the reader's attention. As
> such, the additional "?!" decoration became superfluous (when output is
> colored), however the decoration was retained since it serves as a good
> needle when using the terminal's search feature to "jump" to the next
> problem.
>
> Nevertheless, the "?!" decoration is noisy and ugly and makes it
> unnecessarily difficult for the reader to pluck the problem description
> from the annotation. For instance, it is easier to see at a glance what
> the problem is in:
>
>     ERR missing '&&'
>
> than in the noisier:
>
>     ?!ERR missing '&&'?!
>
> Therefore drop the "!?" decoration when output is colored (but retain it
> otherwise).

Wait.  That does not qualify "Therefore".

We talked about a "good needle" and then complained how ugly the
string that was happened to be chosen as good needle is.  That is
not enough to explain why it is justified to "lose" the needle.  The
only thing you justified is to move away from the ugly pattern, as a
typical "terminal's search feature" does not give us an easy way to
"jump to the next text painted yellow".

> Note that the preceding change gave all problem annotations a uniform
> "ERR" prefix which serves as a reasonably suitable replacement needle
> when searching in a terminal, so loss of "?!" in the output should not
> be overly problematic.

Drop this separate paragraph, promote its contents up from "Note"
status and as a proper part of the previous sentence in its rewrite,
something like:

    Since the errors are all uniformly prefixed with "ERR", which
    can be used as the "good needle" instead, lose the "!?"
    decoration when output is colored.

to replace "Therefore" and everything that follow.

> @@ -663,7 +666,7 @@ sub check_test {
>  	$checked =~ s/^\d+ \n//;
>  	$checked =~ s/(\s) \?!/$1?!/mg;
>  	$checked =~ s/\?! (\s)/?!$1/mg;
> -	$checked =~ s/(\?![^?]+\?!)/$c->{rev}$c->{red}$1$c->{reset}/mg;
> +	$checked =~ s/\?!([^?]+)\?!/$erropen$1$errclose/mg;

Hmph.  With $erropen and $errclose, I was hoping that we can shed
the reliance on the "?!" mark even internally.  This is especially
true that in the early part of this sub, the problem description was
very much structured piece of data, not something the consuming code
need to pick out of an already formatted text like this, risking to
get confused by the payload (i.e. the text that came from the
problematic test script inside "substr($body, $start, $pos-$start)"
may contain anything, including "?!", right?).

>  	$checked =~ s/^\d+/$c->{dim}$&$c->{reset}/mg;
>  	$checked .= "\n" unless $checked =~ /\n$/;
>  	push(@{$self->{output}}, "$c->{blue}# chainlint: $title$c->{reset}\n$checked");
> @@ -805,9 +808,9 @@ sub check_script {
>  			my $c = fd_colors(1);
>  			my $s = join('', @{$parser->{output}});
>  			$emit->("$c->{bold}$c->{blue}# chainlint: $path$c->{reset}\n" . $s);
> -			$nerrs += () = $s =~ /\?![^?]+\?!/g;
>  		}
>  		$ntests += $parser->{ntests};
> +		$nerrs += $parser->{nerrs};
>  	}
>  	return [$id, $nscripts, $ntests, $nerrs];
>  }
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 54247604cb..b652cb98cd 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1606,7 +1606,7 @@ if test "${GIT_TEST_CHAIN_LINT:-1}" != 0 &&
>     test "${GIT_TEST_EXT_CHAIN_LINT:-1}" != 0
>  then
>  	"$PERL_PATH" "$TEST_DIRECTORY/chainlint.pl" "$0" ||
> -		BUG "lint error (see '?!...!? annotations above)"
> +		BUG "lint error (see 'ERR' annotations above)"
>  fi
>  
>  # Last-minute variable setup

Overall the two patches looked great.
Thanks.

Jeff King Aug. 29, 2024, 5:10 p.m. UTC | #3

On Thu, Aug 29, 2024 at 12:03:37PM +0200, Patrick Steinhardt wrote:

> Okay, now the "ERR" prefix becomes a bit more important because we drop
> the other punctuation. I'm still not much of a fan of it, though. Makes
> me wonder whether we want to take a clue from how compilers nowadays
> format this, e.g. by using "pointers".
> 
> So this:
> 
>     2 (
>     3 	foo |
>     4 	bar |
>     5 	baz &&
>     6 
>     7 	fish |
>     8 	cow ?!AMP?!
>     9 
>     10 	sunder
>     11 )
> 
> Would become this:
> 
>     t/chainlint/pipe.actual:8: error: expected ampersands (&&)
>     7 	fish |
>     8 	cow 
>             ^
>     9
> 
> While this would be neat, I guess it would also be way more work than
> the current series you have posted. And whether that work is ultimately
> really worth it may be another question. Probably not.

I think that output is quite readable. One bonus is that it follows the
usual "quickfix" format, so there's editor support for jumping to the
problematic spot.

It probably is more verbose if you have multiple errors right next to
each other (since now we just show the annotated source text). But that
is going to be relatively rare compared to single mistakes, I'd think.

-Peff

Eric Sunshine Aug. 29, 2024, 6:28 p.m. UTC | #4

On Thu, Aug 29, 2024 at 6:03 AM Patrick Steinhardt <ps@pks.im> wrote:
> On Thu, Aug 29, 2024 at 05:16:25AM -0400, Eric Sunshine wrote:
> > Note that the preceding change gave all problem annotations a uniform
> > "ERR" prefix which serves as a reasonably suitable replacement needle
> > when searching in a terminal, so loss of "?!" in the output should not
> > be overly problematic.
>
> Okay, now the "ERR" prefix becomes a bit more important because we drop
> the other punctuation. I'm still not much of a fan of it, though. Makes
> me wonder whether we want to take a clue from how compilers nowadays
> format this, e.g. by using "pointers".
>
> So this:
>     7   fish |
>     8   cow ?!AMP?!
>
> Would become this:
>     t/chainlint/pipe.actual:8: error: expected ampersands (&&)
>     7   fish |
>     8   cow
>             ^
>
> While this would be neat, I guess it would also be way more work than
> the current series you have posted. And whether that work is ultimately
> really worth it may be another question. Probably not.

Interestingly, I'm not always a fan of the sort of compiler output you
suggest since I often have more difficulty interpreting the output and
locating the actual problem[*] than if the annotation was merely
inline, sitting immediately next to the problem itself.

Also, the vast majority of the time, chainlint will be flagging a
missing "&&" at the end of line, so with the inline annotation, it's
very easy to see (especially when colored) exactly where the problem
is at a glance.

Hence, the cost of implementing "^" doesn't feel particularly
worthwhile (and, with my limited Git time these days, I'm unlikely to
do so).

[*] This is especially so when dealing with foreign code which is
wider than my 80-column terminal or 80-column editor window, in which
the source text and the "^" may wrap over multiple lines.

Eric Sunshine Aug. 29, 2024, 6:37 p.m. UTC | #5

On Thu, Aug 29, 2024 at 1:10 PM Jeff King <peff@peff.net> wrote:
> On Thu, Aug 29, 2024 at 12:03:37PM +0200, Patrick Steinhardt wrote:
> > Would become this:
> >
> >     t/chainlint/pipe.actual:8: error: expected ampersands (&&)
> >     7         fish |
> >     8         cow
> >             ^
> >     9
> >
> > While this would be neat, I guess it would also be way more work than
> > the current series you have posted. And whether that work is ultimately
> > really worth it may be another question. Probably not.
>
> I think that output is quite readable. One bonus is that it follows the
> usual "quickfix" format, so there's editor support for jumping to the
> problematic spot.

The "quickfix" notation has more appeal (to me) than the "^"
annotation since it is immediately useful in an editor and doesn't
require extra cogitation. A couple concerns which come to mind are
that it makes the output even more noisy (which could be distracting),
and that the path, if relative, might not correspond to the editor's
"cwd" or to a path in the editor's search list, so the promised "jump
to next problem" automation may not materialize.

> It probably is more verbose if you have multiple errors right next to
> each other (since now we just show the annotated source text). But that
> is going to be relatively rare compared to single mistakes, I'd think.

Maybe. Maybe not. A first-draft test by a newcomer might very well
break the &&-chain in many spots.

Nevertheless, I'm not likely to implement this any time soon (or at all).

Eric Sunshine Aug. 30, 2024, 11:30 p.m. UTC | #6

On Thu, Aug 29, 2024 at 11:55 AM Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <ericsunshine@charter.net> writes:
> > When chainlint detects a problem in a test definition, it highlights the
> > offending code with an "?!ERR ...?!" annotation. The rather curious "?!"
> > delimiter was chosen to draw the reader's attention to the problem area.
> >
> > Later, chainlint learned to color its output when sent to a terminal.
> > Problem annotations are colored with a red background which stands out
> > well from surrounding text, thus easily draws the reader's attention. As
> > such, the additional "?!" decoration became superfluous (when output is
> > colored), however the decoration was retained since it serves as a good
> > needle when using the terminal's search feature to "jump" to the next
> > problem.
> >
> > Nevertheless, the "?!" decoration is noisy and ugly and makes it
> > unnecessarily difficult for the reader to pluck the problem description
> > from the annotation. For instance, it is easier to see at a glance what
> > the problem is in:
> >     ERR missing '&&'
> > than in the noisier:
> >     ?!ERR missing '&&'?!
> > Therefore drop the "!?" decoration when output is colored (but retain it
> > otherwise).
>
> Wait.  That does not qualify "Therefore".
>
> We talked about a "good needle" and then complained how ugly the
> string that was happened to be chosen as good needle is.  That is
> not enough to explain why it is justified to "lose" the needle.  The
> only thing you justified is to move away from the ugly pattern, as a
> typical "terminal's search feature" does not give us an easy way to
> "jump to the next text painted yellow".
>
> > Note that the preceding change gave all problem annotations a uniform
> > "ERR" prefix which serves as a reasonably suitable replacement needle
> > when searching in a terminal, so loss of "?!" in the output should not
> > be overly problematic.
>
> Drop this separate paragraph, promote its contents up from "Note"
> status and as a proper part of the previous sentence in its rewrite,
> something like:
>
>     Since the errors are all uniformly prefixed with "ERR", which
>     can be used as the "good needle" instead, lose the "!?"
>     decoration when output is colored.
>
> to replace "Therefore" and everything that follow.

Perhaps the following would make for a more palatable commit message?

    When chainlint detects a problem in a test definition, it
    highlights the offending code with a "?!...?!" annotation. The
    rather curious "?!" decoration was chosen to draw the reader's
    attention to the problem area and to act as a good "needle" when
    using the terminal's search feature to "jump" to the next problem.

    Later, chainlint learned to color its output when sent to a
    terminal. Problem annotations are colored with a red background
    which stands out well from surrounding text, thus easily draws the
    reader's attention. Together with the preceding change which gave
    all problem annotations a uniform "ERR" prefix, the noisy "?!"
    decoration has become superfluous as a search "needle" so omit it
    when output is colored.

> > @@ -663,7 +666,7 @@ sub check_test {
> >       $checked =~ s/(\s) \?!/$1?!/mg;
> >       $checked =~ s/\?! (\s)/?!$1/mg;
> > -     $checked =~ s/(\?![^?]+\?!)/$c->{rev}$c->{red}$1$c->{reset}/mg;
> > +     $checked =~ s/\?!([^?]+)\?!/$erropen$1$errclose/mg;
>
> Hmph.  With $erropen and $errclose, I was hoping that we can shed
> the reliance on the "?!" mark even internally.

Good point. Just above the shown context:

    $checked .= substr($body, $start, $pos - $start) . " ?!ERR $err?! ";

unconditionally adds the "?!" decorations even when the output is
colored, and then the:

    $checked =~ s/\?!([^?]+)\?!/$erropen$1$errclose/mg;

removes them. It would be nice to add the "?!" decoration only when
not coloring output, however, together with the explicit space added
before and after "?!...?!" by:

    $checked .= substr($body, $start, $pos - $start) . " ?!ERR $err?! ";

the related:

    $checked =~ s/(\s) \?!/$1?!/mg;
    $checked =~ s/\?! (\s)/?!$1/mg;

ensure, for aesthetic reasons, that there is one, and only one, space
before and after the annotation.

It may be possible to do something like this instead (untested), but
I'm not sure it's worth the complexity:

    $checked .= substr($body, $start, $pos - $start);
    $checked .= ' ' unless $checked =~ /\s$/;
    $checked .= "$erropenERR $err$errclose";
    $checked .= ' ' unless $pos + 1 >= length($body) ||
        substr($body, $pos + 1, 1) =~ /\s/;

> This is especially
> true that in the early part of this sub, the problem description was
> very much structured piece of data, not something the consuming code
> need to pick out of an already formatted text like this, risking to
> get confused by the payload (i.e. the text that came from the
> problematic test script inside "substr($body, $start, $pos-$start)"
> may contain anything, including "?!", right?).

As first implemented, there was no structured "problem description".
chainlint originally just output a stream of raw parse tokens (not the
original test text), and when a problem was discovered the "?!...?!"
annotations were embedded directly in the output stream. This was
still the case even when colored output was implemented[1]; in fact,
the annotations were colored after-the-fact by searching for "?!...?!"
in the output stream. It was only when chainlint was taught to output
the original test text verbatim[2] that problem descriptions became
structured data.

I was never overly concerned about "?!" appearing as part of the
actual payload, partly because it is an unusual character sequence to
be present in shell code anyhow (indeed, it only appears in t5510),
partly because it requires "?!" to be doubled up (i.e. "?!...?!"), and
partly because this processing kicks in only when a linting problem is
actually discovered,

[1]: 7c04aa7390 (chainlint: colorize problem annotations and test
delimiters, 2022-09-13)
[2]: 73c768dae9 (chainlint: annotate original test definition rather
than token stream, 2022-11-08)

Junio C Hamano Aug. 30, 2024, 11:51 p.m. UTC | #7

Eric Sunshine <sunshine@sunshineco.com> writes:

> It may be possible to do something like this instead (untested), but
> I'm not sure it's worth the complexity:
>
>     $checked .= substr($body, $start, $pos - $start);
>     $checked .= ' ' unless $checked =~ /\s$/;
>     $checked .= "$erropenERR $err$errclose";
>     $checked .= ' ' unless $pos + 1 >= length($body) ||
>         substr($body, $pos + 1, 1) =~ /\s/;

I think the complexity you mention is the updates to existing code
to get to the above end state?  Using some setup like ...

	($erropen, errclose) = 
		$colored_output ? ("?!", "?!") : ("<RED>", "<RESET>");

... and then using a code like the above would be quite
straightforward and the end result cannot become simpler than that
;-)

> As first implemented, there was no structured "problem description".
> chainlint originally just output a stream of raw parse tokens (not the
> original test text), and when a problem was discovered the "?!...?!"
> annotations were embedded directly in the output stream. This was
> still the case even when colored output was implemented[1]; in fact,
> the annotations were colored after-the-fact by searching for "?!...?!"
> in the output stream. It was only when chainlint was taught to output
> the original test text verbatim[2] that problem descriptions became
> structured data.

Exactly.

Thanks.

[2/2] chainlint: reduce annotation noise-factor

Commit Message

Comments

Patch