diff mbox series

[01/18] t: add skeleton chainlint.pl

Message ID 3423df94bd6035640828a2508968cf8e1f5b4dda.1661992197.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit b4f25b07c74fc294cab6c12d09faa2021c67f25a
Headers show
Series make test "linting" more comprehensive | expand

Commit Message

Eric Sunshine Sept. 1, 2022, 12:29 a.m. UTC
From: Eric Sunshine <sunshine@sunshineco.com>

Although chainlint.sed usefully identifies broken &&-chains in tests, it
has several shortcomings which include:

  * only detects &&-chain breakage in subshells (one-level deep)

  * does not check for broken top-level &&-chains; that task is left to
    the "magic exit code 117" checker built into test-lib.sh, however,
    that detection does not extend to `{...}` blocks, `$(...)`
    expressions, or compound statements such as `if...fi`,
    `while...done`, `case...esac`

  * uses heuristics, which makes it (potentially) fallible and difficult
    to tweak to handle additional real-world cases

  * written in `sed` and employs advanced `sed` operators which are
    probably not well-known to many programmers, thus the pool of people
    who can maintain it is likely small

  * manually simulates recursion into subshells which makes it much more
    difficult to reason about than, say, a traditional top-down parser

  * checks each test as the test is run, which can get expensive for
    tests which are run repeatedly by functions or loops since their
    bodies will be checked over and over (tens or hundreds of times)
    unnecessarily

To address these shortcomings, begin implementing a more functional and
precise test linter which understands shell syntax and semantics rather
than employing heuristics, thus is able to recognize structural problems
with tests beyond broken &&-chains.

The new linter is written in Perl, thus should be more accessible to a
wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about, and allows it to inspect compound
statements within test bodies to any depth.

Furthermore, it can check all test definitions in the entire project in
a single invocation rather than having to be invoked once per test, and
each test definition is checked only once no matter how many times the
test is actually run.

At this stage, the new linter is just a skeleton containing boilerplate
which handles command-line options, collects and reports statistics, and
feeds its arguments -- paths of test scripts -- to a (presently)
do-nothing script parser for validation. Subsequent changes will flesh
out the functionality.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/chainlint.pl | 115 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 115 insertions(+)
 create mode 100755 t/chainlint.pl

Comments

Ævar Arnfjörð Bjarmason Sept. 1, 2022, 12:27 p.m. UTC | #1
On Thu, Sep 01 2022, Eric Sunshine via GitGitGadget wrote:

> From: Eric Sunshine <sunshine@sunshineco.com>
> [...]
> diff --git a/t/chainlint.pl b/t/chainlint.pl

I really like this overall direction...

> +use warnings;
> +use strict;

I think that in general we're way overdue for at least a :

	use v5.10.1;

Or even something more aggresive, I think we can definitely depend on a
newer version for this bit of dev tooling.

That makes a lot of things in this series more pleasing to look
at. E.g. you could use named $+{} variables for regexes.

> +package ScriptParser;

I really wish this could be changed to just put this in
t/chainlint/ScriptParser.pm early on, we could set @INC appropriately
and "use" these, which...

> +my $getnow = sub { return time(); };
> +my $interval = sub { return time() - shift; };

Would eliminate any scoping concerns about this sort of thing.

> +if (eval {require Time::HiRes; Time::HiRes->import(); 1;}) {
> +	$getnow = sub { return [Time::HiRes::gettimeofday()]; };
> +	$interval = sub { return Time::HiRes::tv_interval(shift); };
> +}

Is this "require" even needed, Time::HiRes is there since 5.7.* says
"corelist -l Time::HIRes".

> [...]
> +sub check_script {
> +	my ($id, $next_script, $emit) = @_;
> +	my ($nscripts, $ntests, $nerrs) = (0, 0, 0);
> +	while (my $path = $next_script->()) {
> +		$nscripts++;
> +		my $fh;
> +		unless (open($fh, "<", $path)) {
> +			$emit->("?!ERR?! $path: $!\n");

If we can depend on v5.10.1 this can surely become:

	use autodie qw(open close);

No?

> +			$nerrs += () = $s =~ /\?![^?]+\?!/g;

y'know if we add some whitespace there we can conform to
https://metacpan.org/dist/perlsecret/view/lib/perlsecret.pod >:) (not
serious...)
Eric Sunshine Sept. 2, 2022, 6:53 p.m. UTC | #2
On Thu, Sep 1, 2022 at 8:32 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> On Thu, Sep 01 2022, Eric Sunshine via GitGitGadget wrote:
> > From: Eric Sunshine <sunshine@sunshineco.com>
> > [...]
> > diff --git a/t/chainlint.pl b/t/chainlint.pl
>
> I really like this overall direction...

Thanks for running an eye over the patches.

> > +use warnings;
> > +use strict;
>
> I think that in general we're way overdue for at least a :
>
>         use v5.10.1;
>
> Or even something more aggresive, I think we can definitely depend on a
> newer version for this bit of dev tooling.

Being stuck with an 11+ year-old primary development machine which
can't be upgraded to a newer OS due to vendor end-of-life declaration,
and with old tools installed, I have little or no interest in bumping
the minimum version, especially since older Perl versions are
perfectly adequate for this task. Undertaking such a version bump
would also be outside the scope of this patch series (and I simply
don't have the free time or desire to pursue it).

> That makes a lot of things in this series more pleasing to look
> at. E.g. you could use named $+{} variables for regexes.

Perhaps, but (1) that would not be very relevant for this script which
typically only extracts "$1", and (2) I've rarely found cases when
named variables help significantly with clarity, but then most of my
real-life regexes generally only extract one or two bits of
information, periodically three, and those bits ("$1", "$2", etc.) are
immediately assigned to variables with meaningful names.

> > +package ScriptParser;
>
> I really wish this could be changed to just put this in
> t/chainlint/ScriptParser.pm early on, we could set @INC appropriately
> and "use" these, which...

I intentionally avoided splitting this into multiple modules because I
wanted it to be easy drop into or adapt to other projects (i.e.
sharness[1]). Of course, it is effectively a shell parser written in
Perl, and it's conceivable that the parser part of it could have uses
outside of Git, so modularizing it might be a good idea, but that's a
task for some future date if such a need arises.

[1]: https://github.com/chriscool/sharness

> > +my $getnow = sub { return time(); };
> > +my $interval = sub { return time() - shift; };
>
> Would eliminate any scoping concerns about this sort of thing.

As above, this is easily addressed if/when someone ever wants to reuse
the code outside of Git for some other purpose. I doubt it's worth
worrying about now.

> > +if (eval {require Time::HiRes; Time::HiRes->import(); 1;}) {
> > +     $getnow = sub { return [Time::HiRes::gettimeofday()]; };
> > +     $interval = sub { return Time::HiRes::tv_interval(shift); };
> > +}
>
> Is this "require" even needed, Time::HiRes is there since 5.7.* says
> "corelist -l Time::HIRes".

Unfortunately, this is needed. The Windows CI instances the Git
project uses don't have Time::HiRes installed (and it's outside the
scope of this series to address shortcomings in the CI
infrastructure).

> > +sub check_script {
> > +     my ($id, $next_script, $emit) = @_;
> > +     my ($nscripts, $ntests, $nerrs) = (0, 0, 0);
> > +     while (my $path = $next_script->()) {
> > +             $nscripts++;
> > +             my $fh;
> > +             unless (open($fh, "<", $path)) {
> > +                     $emit->("?!ERR?! $path: $!\n");
>
> If we can depend on v5.10.1 this can surely become:
>
>         use autodie qw(open close);
>
> No?

No. It's clipped in your response, but the full snippet looks like this:

    unless (open($fh, "<", $path)) {
        $emit->("?!ERR?! $path: $!\n");
        next;
    }

The important point is that I _don't_ want the program to "die" if it
can't open an input file; instead, it should continue processing all
the other input files, and the open-failure should be reported as just
another error/problem it encountered along the way.
diff mbox series

Patch

diff --git a/t/chainlint.pl b/t/chainlint.pl
new file mode 100755
index 00000000000..e8ab95c7858
--- /dev/null
+++ b/t/chainlint.pl
@@ -0,0 +1,115 @@ 
+#!/usr/bin/env perl
+#
+# Copyright (c) 2021-2022 Eric Sunshine <sunshine@sunshineco.com>
+#
+# This tool scans shell scripts for test definitions and checks those tests for
+# problems, such as broken &&-chains, which might hide bugs in the tests
+# themselves or in behaviors being exercised by the tests.
+#
+# Input arguments are pathnames of shell scripts containing test definitions,
+# or globs referencing a collection of scripts. For each problem discovered,
+# the pathname of the script containing the test is printed along with the test
+# name and the test body with a `?!FOO?!` annotation at the location of each
+# detected problem, where "FOO" is a tag such as "AMP" which indicates a broken
+# &&-chain. Returns zero if no problems are discovered, otherwise non-zero.
+
+use warnings;
+use strict;
+use File::Glob;
+use Getopt::Long;
+
+my $show_stats;
+my $emit_all;
+
+package ScriptParser;
+
+sub new {
+	my $class = shift @_;
+	my $self = bless {} => $class;
+	$self->{output} = [];
+	$self->{ntests} = 0;
+	return $self;
+}
+
+sub parse_cmd {
+	return undef;
+}
+
+# main contains high-level functionality for processing command-line switches,
+# feeding input test scripts to ScriptParser, and reporting results.
+package main;
+
+my $getnow = sub { return time(); };
+my $interval = sub { return time() - shift; };
+if (eval {require Time::HiRes; Time::HiRes->import(); 1;}) {
+	$getnow = sub { return [Time::HiRes::gettimeofday()]; };
+	$interval = sub { return Time::HiRes::tv_interval(shift); };
+}
+
+sub show_stats {
+	my ($start_time, $stats) = @_;
+	my $walltime = $interval->($start_time);
+	my ($usertime) = times();
+	my ($total_workers, $total_scripts, $total_tests, $total_errs) = (0, 0, 0, 0);
+	for (@$stats) {
+		my ($worker, $nscripts, $ntests, $nerrs) = @$_;
+		print(STDERR "worker $worker: $nscripts scripts, $ntests tests, $nerrs errors\n");
+		$total_workers++;
+		$total_scripts += $nscripts;
+		$total_tests += $ntests;
+		$total_errs += $nerrs;
+	}
+	printf(STDERR "total: %d workers, %d scripts, %d tests, %d errors, %.2fs/%.2fs (wall/user)\n", $total_workers, $total_scripts, $total_tests, $total_errs, $walltime, $usertime);
+}
+
+sub check_script {
+	my ($id, $next_script, $emit) = @_;
+	my ($nscripts, $ntests, $nerrs) = (0, 0, 0);
+	while (my $path = $next_script->()) {
+		$nscripts++;
+		my $fh;
+		unless (open($fh, "<", $path)) {
+			$emit->("?!ERR?! $path: $!\n");
+			next;
+		}
+		my $s = do { local $/; <$fh> };
+		close($fh);
+		my $parser = ScriptParser->new(\$s);
+		1 while $parser->parse_cmd();
+		if (@{$parser->{output}}) {
+			my $s = join('', @{$parser->{output}});
+			$emit->("# chainlint: $path\n" . $s);
+			$nerrs += () = $s =~ /\?![^?]+\?!/g;
+		}
+		$ntests += $parser->{ntests};
+	}
+	return [$id, $nscripts, $ntests, $nerrs];
+}
+
+sub exit_code {
+	my $stats = shift @_;
+	for (@$stats) {
+		my ($worker, $nscripts, $ntests, $nerrs) = @$_;
+		return 1 if $nerrs;
+	}
+	return 0;
+}
+
+Getopt::Long::Configure(qw{bundling});
+GetOptions(
+	"emit-all!" => \$emit_all,
+	"stats|show-stats!" => \$show_stats) or die("option error\n");
+
+my $start_time = $getnow->();
+my @stats;
+
+my @scripts;
+push(@scripts, File::Glob::bsd_glob($_)) for (@ARGV);
+unless (@scripts) {
+	show_stats($start_time, \@stats) if $show_stats;
+	exit;
+}
+
+push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); }));
+show_stats($start_time, \@stats) if $show_stats;
+exit(exit_code(\@stats));