diff mbox series

[5/7] Makefile: add 'check-sort' target

Message ID 5088e93d76e44de9d079b7b2296b8c810828a2f5.1615856156.git.liu.denton@gmail.com (mailing list archive)
State New, archived
Headers show
Series Sort lists and add static-analysis | expand

Commit Message

Denton Liu March 16, 2021, 12:56 a.m. UTC
In the previous few commits, we sorted many lists into ASCII-order. In
order to ensure that they remain that way, add the 'check-sort' target.

The check-sort.perl program ensures that consecutive lines that match
the same regex are sorted in ASCII-order. The 'check-sort' target runs
the check-sort.perl program on some files which are known to contain
sorted lists.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
---

Notes:
    Full disclaimer: this is the first time I've written anything in Perl.
    Please let me know if I'm doing anything unconventional :)

 Makefile        | 25 +++++++++++++++++++++++++
 check-sort.perl | 31 +++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)
 create mode 100755 check-sort.perl

Comments

Eric Sunshine March 16, 2021, 6:37 a.m. UTC | #1
On Mon, Mar 15, 2021 at 8:57 PM Denton Liu <liu.denton@gmail.com> wrote:
> In the previous few commits, we sorted many lists into ASCII-order. In
> order to ensure that they remain that way, add the 'check-sort' target.
> [...]
> Signed-off-by: Denton Liu <liu.denton@gmail.com>
> ---
> +my @regexes = map { qr/^$_/ } @ARGV;
> +my $last_regex = 0;
> +my $last_line = '';
> +while (<STDIN>) {
> +       my $matched = 0;
> +       chomp;
> +       for my $regex (@regexes) {
> +               next unless $_ =~ $regex;
> +               if ($last_regex == $regex) {
> +                       die "duplicate lines: '$_'\n" unless $last_line ne $_;
> +                       die "unsorted lines: '$last_line' before '$_'\n" unless $last_line lt $_;
> +               }
> +               $matched = 1;
> +               $last_regex = $regex;
> +               $last_line = $_;
> +       }
> +       unless ($matched) {
> +               $last_regex = 0;
> +               $last_line = '';
> +       }
> +}

This is, of course, endlessly bikesheddable. Here is a shorter -- and,
at least for me, easier to understand -- way to do it:

    my $rc = 0;
    chomp(my @all = <STDIN>);
    foreach my $needle (@ARGV) {
        my @lines = grep(/^$needle/, @all);
        if (join("\n", @lines) ne join("\n", sort @lines)) {
            print "'$needle' lines not sorted\n";
            $rc = 1;
        }
    }
    exit $rc;

By the way, it might be a good idea to also print the filename in
which the problem occurred. Such context can be important for the
person trying to track down the complaint. To do so, you'd probably
want to pass the filename as an argument, and open and read the file
rather than sending it only as standard-input.
Denton Liu March 17, 2021, 9:50 a.m. UTC | #2
Hi Eric,

On Tue, Mar 16, 2021 at 02:37:16AM -0400, Eric Sunshine wrote:
> On Mon, Mar 15, 2021 at 8:57 PM Denton Liu <liu.denton@gmail.com> wrote:
> > In the previous few commits, we sorted many lists into ASCII-order. In
> > order to ensure that they remain that way, add the 'check-sort' target.
> > [...]
> > Signed-off-by: Denton Liu <liu.denton@gmail.com>
> > ---
> > +my @regexes = map { qr/^$_/ } @ARGV;
> > +my $last_regex = 0;
> > +my $last_line = '';
> > +while (<STDIN>) {
> > +       my $matched = 0;
> > +       chomp;
> > +       for my $regex (@regexes) {
> > +               next unless $_ =~ $regex;
> > +               if ($last_regex == $regex) {
> > +                       die "duplicate lines: '$_'\n" unless $last_line ne $_;
> > +                       die "unsorted lines: '$last_line' before '$_'\n" unless $last_line lt $_;
> > +               }
> > +               $matched = 1;
> > +               $last_regex = $regex;
> > +               $last_line = $_;
> > +       }
> > +       unless ($matched) {
> > +               $last_regex = 0;
> > +               $last_line = '';
> > +       }
> > +}
> 
> This is, of course, endlessly bikesheddable. Here is a shorter -- and,
> at least for me, easier to understand -- way to do it:
> 
>     my $rc = 0;
>     chomp(my @all = <STDIN>);
>     foreach my $needle (@ARGV) {
>         my @lines = grep(/^$needle/, @all);
>         if (join("\n", @lines) ne join("\n", sort @lines)) {
>             print "'$needle' lines not sorted\n";
>             $rc = 1;
>         }
>     }
>     exit $rc;

That's pretty clever, thanks for showing me how it's done :)

However, the reason I wrote it out the way that I did is because my code
ensures that consecutive lines matching the regex are sorted but if
there are any breaks between matching regex lines, it will consider them
separate blocks. Just taking all the lines fails in the case of
`LIB_OBJS \+=` in Makefile since we have

	LIB_OBJS += zlib.o

	[... many intervening lines ...]

	LIB_OBJS += $(COMPAT_OBJS)

and that is technically unsorted. That being said, I don't really like
my current approach that much.

I think I have two better options:

	1. Tighten up the regexes so that it excludes the
	   $(COMPAT_OBJS). I don't want to be too strict, though,
	   because if we end up not matching a line it might end up
	   unsorted.

	2. Consider blank lines to be block separators and only consider
	   it to be sorted if the text matching regexes within a block
	   are sorted.

Now that I've written that all out, I think I like option 1 more,
although I could definitely be convinced to go either way.

> By the way, it might be a good idea to also print the filename in
> which the problem occurred. Such context can be important for the
> person trying to track down the complaint. To do so, you'd probably
> want to pass the filename as an argument, and open and read the file
> rather than sending it only as standard-input.

Agreed.

Thanks,
Denton
Ævar Arnfjörð Bjarmason March 17, 2021, 12:47 p.m. UTC | #3
On Tue, Mar 16 2021, Denton Liu wrote:

> In the previous few commits, we sorted many lists into ASCII-order. In
> order to ensure that they remain that way, add the 'check-sort' target.
>
> The check-sort.perl program ensures that consecutive lines that match
> the same regex are sorted in ASCII-order. The 'check-sort' target runs
> the check-sort.perl program on some files which are known to contain
> sorted lists.
>
> Signed-off-by: Denton Liu <liu.denton@gmail.com>
> ---
>
> Notes:
>     Full disclaimer: this is the first time I've written anything in Perl.
>     Please let me know if I'm doing anything unconventional :)
>
>  Makefile        | 25 +++++++++++++++++++++++++
>  check-sort.perl | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 56 insertions(+)
>  create mode 100755 check-sort.perl
>
> diff --git a/Makefile b/Makefile
> index 5832aa33da..b23dff384d 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -3240,6 +3240,31 @@ check-docs::
>  check-builtins::
>  	./check-builtins.sh
>  
> +.PHONY: check-sort
> +check-sort::
> +	./check-sort.perl \
> +		'ALL_COMMANDS \+=' \
> +		'ALL_COMMANDS_TO_INSTALL \+=' \
> +		'BINDIR_PROGRAMS_NEED_X \+=' \
> +		'BINDIR_PROGRAMS_NO_X \+=' \
> +		'BUILTIN_OBJS \+=' \
> +		'BUILT_INS \+=' \
> +		'FUZZ_OBJS \+=' \
> +		'GENERATED_H \+=' \
> +		'LIB_OBJS \+=' \
> +		'SCRIPT_LIB \+=' \
> +		'SCRIPT_PERL \+=' \
> +		'SCRIPT_PYTHON \+=' \
> +		'SCRIPT_SH \+=' \
> +		'TEST_BUILTINS_OBJS \+=' \
> +		'TEST_PROGRAMS_NEED_X \+=' \
> +		'THIRD_PARTY_SOURCES \+=' \
> +		'XDIFF_OBJS \+=' \
> +		<Makefile

Why does this part need to be a Perl script at all? We can check this in
the makefile itself. Make has a sort function and string comparisons,
e.g.:

LIB_OBJS_SORTED =
LIB_OBJS_SORTED += $(sort $(LIB_OBJS))
ifneq ("$(LIB_OBJS)", "$(LIB_OBJS_SORTED)")
$(error "please sort and de-duplicate LIB_OBJS!")
endif

This will fail/pass before/after your patches. Note that make's sort
isn't just a sort, it also de-deplicates (not that we're likely to have
that issue).

> [...]
> +	./check-sort.perl '\t\{ "[^"]*",' <git.c

This last one you can IMO be done better as (or if we want to be more
anal, we could make git die on startup if it's not true):
    
    diff --git a/t/t0012-help.sh b/t/t0012-help.sh
    index 5679e29c62..5bd2ebceca 100755
    --- a/t/t0012-help.sh
    +++ b/t/t0012-help.sh
    @@ -77,6 +77,11 @@ test_expect_success 'generate builtin list' '
            git --list-cmds=builtins >builtins
     '
     
    +test_expect_success 'list of builtins in git.c should be sorted' '
    +       sort builtins >sorted &&
    +       test_cmp sorted builtins
    +'
    +
     while read builtin
     do
            test_expect_success "$builtin can handle -h" '

Which just leaves:

> +	./check-sort.perl 'int cmd_[^(]*\(' <builtin.h
> +	./check-sort.perl 'int cmd__[^(]*\(' <t/helper/test-tool.h

As something that can't be done in the Makefile itself or that we're
already extracting from the tests.

Both of those IMO would be better handled down the line by making the
relevant part of these files generated from the data in the Makefile, at
which point we'd have no need for the external perl script.
Jeff King March 17, 2021, 5:32 p.m. UTC | #4
On Wed, Mar 17, 2021 at 01:47:15PM +0100, Ævar Arnfjörð Bjarmason wrote:

> Why does this part need to be a Perl script at all? We can check this in
> the makefile itself. Make has a sort function and string comparisons,
> e.g.:
> 
> LIB_OBJS_SORTED =
> LIB_OBJS_SORTED += $(sort $(LIB_OBJS))
> ifneq ("$(LIB_OBJS)", "$(LIB_OBJS_SORTED)")
> $(error "please sort and de-duplicate LIB_OBJS!")
> endif
> 
> This will fail/pass before/after your patches. Note that make's sort
> isn't just a sort, it also de-deplicates (not that we're likely to have
> that issue).

I like that much better, if only because it runs quickly and
automatically (as opposed to much later as part of some CI process).
Reducing the length of the feedback loop makes automated checks much
less annoying.

It doesn't indicate which lines are out of order or duplicated. We could
probably add some $(shell) magic to dump both sides to diff inside the
ifneq. That's getting unwieldy to use in each site, but I wonder if
something like this solves that:

diff --git a/Makefile b/Makefile
index dfb0f1000f..3fa7893c8b 100644
--- a/Makefile
+++ b/Makefile
@@ -596,6 +596,16 @@ THIRD_PARTY_SOURCES =
 # interactive shell sessions without exporting it.
 unexport CDPATH
 
+# usage: $(call check-sort,VAR)
+# If this complains, then make sure the contents of VAR are ASCII-sorted.
+define check-sort-template
+SORTED_$1 = $$(sort $$($1))
+ifneq ($$($1),$$(SORTED_$1))
+$$(error "please sort and de-duplicate $1!")
+endif
+endef
+check-sort = $(eval $(call check-sort-template,$1))
+
 SCRIPT_SH += git-bisect.sh
 SCRIPT_SH += git-difftool--helper.sh
 SCRIPT_SH += git-filter-branch.sh
@@ -1037,6 +1047,7 @@ LIB_OBJS += ws.o
 LIB_OBJS += wt-status.o
 LIB_OBJS += xdiff-interface.o
 LIB_OBJS += zlib.o
+$(call check-sort,LIB_OBJS)
 
 BUILTIN_OBJS += builtin/add.o
 BUILTIN_OBJS += builtin/am.o

And then it's just a single-liner for each block that should be checked.
We haven't used $(call) or $(eval) yet in our Makefile, but past
discussions have reached the conclusion that they should be safe
(they're both in GNU make 3.80, which is the oldest version worth caring
about).

TBH, I'm a little on the fence on whether automatically checking this is
even worth the hassle. Doing the make function above was a fun
diversion, but already I think this discussion has taken more time than
people actually spend resolving conflicts on unsorted Makefile lists.

> > [...]
> > +	./check-sort.perl '\t\{ "[^"]*",' <git.c
> 
> This last one you can IMO be done better as (or if we want to be more
> anal, we could make git die on startup if it's not true):
>     
>     diff --git a/t/t0012-help.sh b/t/t0012-help.sh
>     index 5679e29c62..5bd2ebceca 100755
>     --- a/t/t0012-help.sh
>     +++ b/t/t0012-help.sh
>     @@ -77,6 +77,11 @@ test_expect_success 'generate builtin list' '
>             git --list-cmds=builtins >builtins
>      '
>      
>     +test_expect_success 'list of builtins in git.c should be sorted' '
>     +       sort builtins >sorted &&
>     +       test_cmp sorted builtins
>     +'

Again, much nicer, because it just happens as part of the test suite
(which I hope everyone is running after making changes). If we care
about the order in the source code, then this check implies that we are
not sorting the list internally before outputting it, but that is
probably reasonable.

> > +	./check-sort.perl 'int cmd_[^(]*\(' <builtin.h
> > +	./check-sort.perl 'int cmd__[^(]*\(' <t/helper/test-tool.h
> 
> As something that can't be done in the Makefile itself or that we're
> already extracting from the tests.
> 
> Both of those IMO would be better handled down the line by making the
> relevant part of these files generated from the data in the Makefile, at
> which point we'd have no need for the external perl script.

Agreed. Automating away a tedious task is always better than
automatically checking that somebody did it right. :)

-Peff
Ævar Arnfjörð Bjarmason March 17, 2021, 5:42 p.m. UTC | #5
On Wed, Mar 17 2021, Jeff King wrote:

>  SCRIPT_SH += git-bisect.sh
>  SCRIPT_SH += git-difftool--helper.sh
>  SCRIPT_SH += git-filter-branch.sh
> @@ -1037,6 +1047,7 @@ LIB_OBJS += ws.o
>  LIB_OBJS += wt-status.o
>  LIB_OBJS += xdiff-interface.o
>  LIB_OBJS += zlib.o
> +$(call check-sort,LIB_OBJS)
>  
>  BUILTIN_OBJS += builtin/add.o
>  BUILTIN_OBJS += builtin/am.o
>
> And then it's just a single-liner for each block that should be checked.
> We haven't used $(call) or $(eval) yet in our Makefile, but past
> discussions have reached the conclusion that they should be safe
> (they're both in GNU make 3.80, which is the oldest version worth caring
> about).

...also this sort of thing can be guarded by "ifdef DEVELOPER" or
something, which AFAICT (from trying to introduce syntax errors etc.)
will get parsed first, before "make" even tries to parse what's within
the ifdef.

So if we have bells & whistles in that sort of setup it can be less
portable than the Makefile in general..
Junio C Hamano March 17, 2021, 5:59 p.m. UTC | #6
Denton Liu <liu.denton@gmail.com> writes:

> +	./check-sort.perl 'int cmd_[^(]*\(' <builtin.h
> +	./check-sort.perl 'int cmd__[^(]*\(' <t/helper/test-tool.h

These two are trivial to see.

> +	./check-sort.perl '\t\{ "[^"]*",' <git.c

This is too brittle to be acceptable.  It FORBIDS us from
introducing initialization for another table to the file.

I won't participate in the bikeshedding of how the Perl script would
be best written ;-)
Junio C Hamano March 17, 2021, 6:01 p.m. UTC | #7
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +	./check-sort.perl '\t\{ "[^"]*",' <git.c
>
> This last one you can IMO be done better as (or if we want to be more
> anal, we could make git die on startup if it's not true):
>     
>     diff --git a/t/t0012-help.sh b/t/t0012-help.sh
>     index 5679e29c62..5bd2ebceca 100755
>     --- a/t/t0012-help.sh
>     +++ b/t/t0012-help.sh
>     @@ -77,6 +77,11 @@ test_expect_success 'generate builtin list' '
>             git --list-cmds=builtins >builtins
>      '
>      
>     +test_expect_success 'list of builtins in git.c should be sorted' '
>     +       sort builtins >sorted &&
>     +       test_cmp sorted builtins
>     +'

"LANG=C LC_ALL=C sort ..."

I like this 100% better than the original ;-)
Ævar Arnfjörð Bjarmason March 17, 2021, 6:16 p.m. UTC | #8
On Wed, Mar 17 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> +	./check-sort.perl '\t\{ "[^"]*",' <git.c
>>
>> This last one you can IMO be done better as (or if we want to be more
>> anal, we could make git die on startup if it's not true):
>>     
>>     diff --git a/t/t0012-help.sh b/t/t0012-help.sh
>>     index 5679e29c62..5bd2ebceca 100755
>>     --- a/t/t0012-help.sh
>>     +++ b/t/t0012-help.sh
>>     @@ -77,6 +77,11 @@ test_expect_success 'generate builtin list' '
>>             git --list-cmds=builtins >builtins
>>      '
>>      
>>     +test_expect_success 'list of builtins in git.c should be sorted' '
>>     +       sort builtins >sorted &&
>>     +       test_cmp sorted builtins
>>     +'
>
> "LANG=C LC_ALL=C sort ..."
>
> I like this 100% better than the original ;-)

We don't need to use "LANG=C LC_ALL=C sort", the test-lib.sh sets that
already, so just "sort" works consistently.

It's also why with GETTEXT_POISON gone we can just "grep" output,
instead of worrying that it may be in the user's locale.
Eric Sunshine March 17, 2021, 9:48 p.m. UTC | #9
On Wed, Mar 17, 2021 at 1:34 PM Jeff King <peff@peff.net> wrote:
> TBH, I'm a little on the fence on whether automatically checking this is
> even worth the hassle. Doing the make function above was a fun
> diversion, but already I think this discussion has taken more time than
> people actually spend resolving conflicts on unsorted Makefile lists.

I had the same reaction. Like you, I jumped in for the fun diversion.
It allowed me to flex my Perl muscle a bit which has atrophied, but an
out-of-order item here and there is such a minor concern, especially
since they don't impact correctness, that I worry that such a CI job
would be more hassle than it's worth. Making the feedback loop
tighter, as discussed elsewhere in this thread, makes the idea of the
automated check a bit more palatable.
Jeff King March 17, 2021, 10:01 p.m. UTC | #10
On Wed, Mar 17, 2021 at 05:48:18PM -0400, Eric Sunshine wrote:

> On Wed, Mar 17, 2021 at 1:34 PM Jeff King <peff@peff.net> wrote:
> > TBH, I'm a little on the fence on whether automatically checking this is
> > even worth the hassle. Doing the make function above was a fun
> > diversion, but already I think this discussion has taken more time than
> > people actually spend resolving conflicts on unsorted Makefile lists.
> 
> I had the same reaction. Like you, I jumped in for the fun diversion.
> It allowed me to flex my Perl muscle a bit which has atrophied, but an
> out-of-order item here and there is such a minor concern, especially
> since they don't impact correctness, that I worry that such a CI job
> would be more hassle than it's worth. Making the feedback loop
> tighter, as discussed elsewhere in this thread, makes the idea of the
> automated check a bit more palatable.

There's an implication in what both of us said that I think is worth
calling out explicitly: I would not feel the same about a problem that
impacts the correctness of the resulting code. E.g., if the list of
builtins were used with a binary search, then an unsorted list would
produce the wrong result. And that would be worth testing.

It just seems that the stakes here are much lower.

-Peff
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 5832aa33da..b23dff384d 100644
--- a/Makefile
+++ b/Makefile
@@ -3240,6 +3240,31 @@  check-docs::
 check-builtins::
 	./check-builtins.sh
 
+.PHONY: check-sort
+check-sort::
+	./check-sort.perl \
+		'ALL_COMMANDS \+=' \
+		'ALL_COMMANDS_TO_INSTALL \+=' \
+		'BINDIR_PROGRAMS_NEED_X \+=' \
+		'BINDIR_PROGRAMS_NO_X \+=' \
+		'BUILTIN_OBJS \+=' \
+		'BUILT_INS \+=' \
+		'FUZZ_OBJS \+=' \
+		'GENERATED_H \+=' \
+		'LIB_OBJS \+=' \
+		'SCRIPT_LIB \+=' \
+		'SCRIPT_PERL \+=' \
+		'SCRIPT_PYTHON \+=' \
+		'SCRIPT_SH \+=' \
+		'TEST_BUILTINS_OBJS \+=' \
+		'TEST_PROGRAMS_NEED_X \+=' \
+		'THIRD_PARTY_SOURCES \+=' \
+		'XDIFF_OBJS \+=' \
+		<Makefile
+	./check-sort.perl 'int cmd_[^(]*\(' <builtin.h
+	./check-sort.perl 'int cmd__[^(]*\(' <t/helper/test-tool.h
+	./check-sort.perl '\t\{ "[^"]*",' <git.c
+
 ### Test suite coverage testing
 #
 .PHONY: coverage coverage-clean coverage-compile coverage-test coverage-report
diff --git a/check-sort.perl b/check-sort.perl
new file mode 100755
index 0000000000..cd723db14d
--- /dev/null
+++ b/check-sort.perl
@@ -0,0 +1,31 @@ 
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+my @regexes = map { qr/^$_/ } @ARGV;
+my $last_regex = 0;
+my $last_line = '';
+
+while (<STDIN>) {
+	my $matched = 0;
+	chomp;
+
+	for my $regex (@regexes) {
+		next unless $_ =~ $regex;
+
+		if ($last_regex == $regex) {
+			die "duplicate lines: '$_'\n" unless $last_line ne $_;
+			die "unsorted lines: '$last_line' before '$_'\n" unless $last_line lt $_;
+		}
+
+		$matched = 1;
+		$last_regex = $regex;
+		$last_line = $_;
+	}
+
+	unless ($matched) {
+		$last_regex = 0;
+		$last_line = '';
+	}
+}