[v2] test_cmp: diagnose incorrect arguments
diff mbox series

Message ID 20200809174209.15466-1-sunshine@sunshineco.com
State New
Headers show
Series
  • [v2] test_cmp: diagnose incorrect arguments
Related show

Commit Message

Eric Sunshine Aug. 9, 2020, 5:42 p.m. UTC
Under normal circumstances, if a test author misspells a filename passed
to test_cmp(), the error is quickly discovered when the test fails
unexpectedly due to test_cmp() being unable to find the file. However,
if the test is expected to fail, as with test_expect_failure(), a
misspelled filename as argument to test_cmp() will go unnoticed since
the test will indeed fail, but for the wrong reason. Make it easier for
test authors to discover such problems early by sanity-checking the
arguments to test_cmp(). To avoid penalizing all clients of test_cmp()
in the general case, only check for missing files if the comparison
fails.

While at it, make test_cmp_bin() sanity-check its arguments, as well.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---

Notes:
    This is a re-roll of [1] which was motivated by seeing Elijah fix[2]
    several cases of bogus test_cmp() calls which perhaps could have
    been detected earlier.
    
    Changes since v1:
    
    * take into account that some callers pass "-" (meaning standard
      input) as an argument to test_cmp() (pointed out privately by
      Junio)
    
    * show the name of the missing file rather than a placeholder
      (Shourya[3])
    
    [1]: https://lore.kernel.org/git/20200809060810.31370-1-sunshine@sunshineco.com/
    [2]: https://lore.kernel.org/git/7f408b7d4069403b969d334f4940ebf87f1dc797.1596906081.git.gitgitgadget@gmail.com/
    [3]: https://lore.kernel.org/git/20200809083227.GA11219@konoha/

Interdiff against v1:
  diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
  index 8d77deebd2..a12d6a3fc9 100644
  --- a/t/test-lib-functions.sh
  +++ b/t/test-lib-functions.sh
  @@ -955,8 +955,8 @@ test_cmp() {
   	test $# -eq 2 || BUG "test_cmp requires two arguments"
   	if ! eval "$GIT_TEST_CMP" '"$@"'
   	then
  -		test -e "$1" || BUG "test_cmp 'expect' file missing"
  -		test -e "$2" || BUG "test_cmp 'actual' file missing"
  +		test "x$1" = x- || test -e "$1" || BUG "test_cmp '$1' missing"
  +		test "x$2" = x- || test -e "$2" || BUG "test_cmp '$2' missing"
   		return 1
   	fi
   }
  @@ -990,8 +990,8 @@ test_cmp_bin() {
   	test $# -eq 2 || BUG "test_cmp_bin requires two arguments"
   	if ! cmp "$@"
   	then
  -		test -e "$1" || BUG "test_cmp_bin 'expect' file missing"
  -		test -e "$2" || BUG "test_cmp_bin 'actual' file missing"
  +		test "x$1" = x- || test -e "$1" || BUG "test_cmp_bin '$1' missing"
  +		test "x$2" = x- || test -e "$2" || BUG "test_cmp_bin '$2' missing"
   		return 1
   	fi
   }

 t/test-lib-functions.sh | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Junio C Hamano Aug. 9, 2020, 7:12 p.m. UTC | #1
Eric Sunshine <sunshine@sunshineco.com> writes:

> Under normal circumstances, if a test author misspells a filename passed
> to test_cmp(), the error is quickly discovered when the test fails
> unexpectedly due to test_cmp() being unable to find the file. However,
> if the test is expected to fail, as with test_expect_failure(), a
> misspelled filename as argument to test_cmp() will go unnoticed since
> the test will indeed fail, but for the wrong reason. Make it easier for
> test authors to discover such problems early by sanity-checking the
> arguments to test_cmp(). To avoid penalizing all clients of test_cmp()
> in the general case, only check for missing files if the comparison
> fails.
>
> While at it, make test_cmp_bin() sanity-check its arguments, as well.
>
> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
> ---

The motivation makes quite a lot of sense.

I suspect that it is a bug to use "-", i.e. "read from the standard
input", for "$2", because

 (1) if it is used for the expected output, we got the comparison in
     the wrong direction;

 (2) if it is used for the actual output, we have a command whose
     output we are interested in on the upstream side of a pipe to
     discard its exit status.

but I'll let it pass.

> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index b791933ffd..a12d6a3fc9 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -952,7 +952,13 @@ test_expect_code () {
>  # - not all diff versions understand "-u"
>  
>  test_cmp() {
> -	eval "$GIT_TEST_CMP" '"$@"'
> +	test $# -eq 2 || BUG "test_cmp requires two arguments"
> +	if ! eval "$GIT_TEST_CMP" '"$@"'
> +	then
> +		test "x$1" = x- || test -e "$1" || BUG "test_cmp '$1' missing"
> +		test "x$2" = x- || test -e "$2" || BUG "test_cmp '$2' missing"
> +		return 1
> +	fi
>  }
>  
>  # Check that the given config key has the expected value.
> @@ -981,7 +987,13 @@ test_cmp_config() {
>  # test_cmp_bin - helper to compare binary files
>  
>  test_cmp_bin() {
> -	cmp "$@"
> +	test $# -eq 2 || BUG "test_cmp_bin requires two arguments"
> +	if ! cmp "$@"
> +	then
> +		test "x$1" = x- || test -e "$1" || BUG "test_cmp_bin '$1' missing"
> +		test "x$2" = x- || test -e "$2" || BUG "test_cmp_bin '$2' missing"
> +		return 1
> +	fi
>  }
>  
>  # Use this instead of test_cmp to compare files that contain expected and
Eric Sunshine Aug. 9, 2020, 7:34 p.m. UTC | #2
On Sun, Aug 9, 2020 at 3:12 PM Junio C Hamano <gitster@pobox.com> wrote:
> The motivation makes quite a lot of sense.
>
> I suspect that it is a bug to use "-", i.e. "read from the standard
> input", for "$2", because
>
>  (1) if it is used for the expected output, we got the comparison in
>      the wrong direction;

I had the same concern that it didn't necessarily make sense to allow
"-" for $2, but eventually thought of a case such as:

    sed '...' actual | test_cmp expect - &&

which massages 'actual' before test_cmp() sees it. True, we don't
actually have such callers, but it theoretically is legitimate.

>  (2) if it is used for the actual output, we have a command whose
>      output we are interested in on the upstream side of a pipe to
>      discard its exit status.

If the command upstream is a Git command, that could indeed be a
reason for concern, but, as illustrated above, it could just be a
system command.
Junio C Hamano Aug. 10, 2020, 3:22 p.m. UTC | #3
Eric Sunshine <sunshine@sunshineco.com> writes:

> I had the same concern that it didn't necessarily make sense to allow
> "-" for $2, but eventually thought of a case such as:
>
>     sed '...' actual | test_cmp expect - &&
>
> which massages 'actual' before test_cmp() sees it. True, we don't
> actually have such callers, but it theoretically is legitimate.

Yup, that looks a bit too stretching [*] to me, but that was what I
had in mind when I said "I'll let it pass".

    Side note.  Presumably that 'actual' was written by Git, to
    avoid losing its exit code, e.g.

	git frotz >actual &&
	sed '...' actual | test_cmp expect -

    but then it becomes more natural to write the second one like
    so:

	git frotz >raw &&
	sed '...' raw >actual &&
	test_cmp expect actual
Taylor Blau Aug. 11, 2020, 6:32 p.m. UTC | #4
On Sun, Aug 09, 2020 at 01:42:09PM -0400, Eric Sunshine wrote:
> Under normal circumstances, if a test author misspells a filename passed
> to test_cmp(), the error is quickly discovered when the test fails
> unexpectedly due to test_cmp() being unable to find the file. However,
> if the test is expected to fail, as with test_expect_failure(), a
> misspelled filename as argument to test_cmp() will go unnoticed since
> the test will indeed fail, but for the wrong reason. Make it easier for
> test authors to discover such problems early by sanity-checking the
> arguments to test_cmp(). To avoid penalizing all clients of test_cmp()
> in the general case, only check for missing files if the comparison
> fails.
>
> While at it, make test_cmp_bin() sanity-check its arguments, as well.
>
> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
> ---
>
> Notes:
>     This is a re-roll of [1] which was motivated by seeing Elijah fix[2]
>     several cases of bogus test_cmp() calls which perhaps could have
>     been detected earlier.
>
>     Changes since v1:
>
>     * take into account that some callers pass "-" (meaning standard
>       input) as an argument to test_cmp() (pointed out privately by
>       Junio)
>
>     * show the name of the missing file rather than a placeholder
>       (Shourya[3])
>
>     [1]: https://lore.kernel.org/git/20200809060810.31370-1-sunshine@sunshineco.com/
>     [2]: https://lore.kernel.org/git/7f408b7d4069403b969d334f4940ebf87f1dc797.1596906081.git.gitgitgadget@gmail.com/
>     [3]: https://lore.kernel.org/git/20200809083227.GA11219@konoha/
>
> Interdiff against v1:
>   diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
>   index 8d77deebd2..a12d6a3fc9 100644
>   --- a/t/test-lib-functions.sh
>   +++ b/t/test-lib-functions.sh
>   @@ -955,8 +955,8 @@ test_cmp() {
>    	test $# -eq 2 || BUG "test_cmp requires two arguments"
>    	if ! eval "$GIT_TEST_CMP" '"$@"'
>    	then
>   -		test -e "$1" || BUG "test_cmp 'expect' file missing"
>   -		test -e "$2" || BUG "test_cmp 'actual' file missing"
>   +		test "x$1" = x- || test -e "$1" || BUG "test_cmp '$1' missing"
>   +		test "x$2" = x- || test -e "$2" || BUG "test_cmp '$2' missing"
>    		return 1
>    	fi
>    }
>   @@ -990,8 +990,8 @@ test_cmp_bin() {
>    	test $# -eq 2 || BUG "test_cmp_bin requires two arguments"
>    	if ! cmp "$@"
>    	then
>   -		test -e "$1" || BUG "test_cmp_bin 'expect' file missing"
>   -		test -e "$2" || BUG "test_cmp_bin 'actual' file missing"
>   +		test "x$1" = x- || test -e "$1" || BUG "test_cmp_bin '$1' missing"
>   +		test "x$2" = x- || test -e "$2" || BUG "test_cmp_bin '$2' missing"
>    		return 1
>    	fi
>    }
>
>  t/test-lib-functions.sh | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index b791933ffd..a12d6a3fc9 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -952,7 +952,13 @@ test_expect_code () {
>  # - not all diff versions understand "-u"
>
>  test_cmp() {
> -	eval "$GIT_TEST_CMP" '"$@"'
> +	test $# -eq 2 || BUG "test_cmp requires two arguments"
> +	if ! eval "$GIT_TEST_CMP" '"$@"'
> +	then
> +		test "x$1" = x- || test -e "$1" || BUG "test_cmp '$1' missing"
> +		test "x$2" = x- || test -e "$2" || BUG "test_cmp '$2' missing"

Not related to your patch, but I've seen this style of "x$1" in a few
places in test-lib-functions.sh. Why can't this be written as 'test "$1"
= -'?

> +		return 1
> +	fi
>  }
>
>  # Check that the given config key has the expected value.
> @@ -981,7 +987,13 @@ test_cmp_config() {
>  # test_cmp_bin - helper to compare binary files
>
>  test_cmp_bin() {
> -	cmp "$@"
> +	test $# -eq 2 || BUG "test_cmp_bin requires two arguments"
> +	if ! cmp "$@"
> +	then
> +		test "x$1" = x- || test -e "$1" || BUG "test_cmp_bin '$1' missing"
> +		test "x$2" = x- || test -e "$2" || BUG "test_cmp_bin '$2' missing"
> +		return 1
> +	fi
>  }
>
>  # Use this instead of test_cmp to compare files that contain expected and
> --
> 2.28.0.220.ged08abb693
>
Thanks,
Taylor
Eric Sunshine Aug. 11, 2020, 7:25 p.m. UTC | #5
On Tue, Aug 11, 2020 at 2:33 PM Taylor Blau <me@ttaylorr.com> wrote:
> On Sun, Aug 09, 2020 at 01:42:09PM -0400, Eric Sunshine wrote:
> > +             test "x$2" = x- || test -e "$2" || BUG "test_cmp '$2' missing"
>
> Not related to your patch, but I've seen this style of "x$1" in a few
> places in test-lib-functions.sh. Why can't this be written as 'test "$1"
> = -'?

Short answer: To prevent 'test' from thinking that the argument is a switch.

Longer answer:

'test' can accept both switches (i.e. "-e") and non-switch arguments.
Keep in mind, too, that all the quoting is stripped by the shell
_before_ 'test' ever sees its arguments. Let's say that the caller has
a filename whose name actually is "-e" and passes that in as $1. So,
what does 'test' see?

    test -e = -

Rather than comparing literal string "-e" to literal string "-", it's
instead (almost) asking if the file named "=" exists; I say "almost"
because it's actually an error since switch -e only accepts one
argument, but it's being given two arguments, "=" and "-".

You might say that having a file named "-e" (or similar) is unlikely,
however, what is not unlikely is a caller passing "-" for
standard-input as $1. In this case, 'test' sees:

    test - = -

which may or may not be an error in a particular implementation of
'test'. Some implementations may understand that "-" is not a valid
switch, thus infer that you're actually asking for an equality
comparison between arguments, but other implementations may complain
either that there is no switch named "-" or that those arguments
simply make no sense.

This is why it's a very common idiom in shell programming with 'test'
to see "x" prepended, thus ensuring that the argument can't be
confused with a switch.
Taylor Blau Aug. 11, 2020, 9:03 p.m. UTC | #6
On Tue, Aug 11, 2020 at 03:25:03PM -0400, Eric Sunshine wrote:
> On Tue, Aug 11, 2020 at 2:33 PM Taylor Blau <me@ttaylorr.com> wrote:
> > On Sun, Aug 09, 2020 at 01:42:09PM -0400, Eric Sunshine wrote:
> > > +             test "x$2" = x- || test -e "$2" || BUG "test_cmp '$2' missing"
> >
> > Not related to your patch, but I've seen this style of "x$1" in a few
> > places in test-lib-functions.sh. Why can't this be written as 'test "$1"
> > = -'?
>
> Short answer: To prevent 'test' from thinking that the argument is a switch.

Makes sense. Now I feel silly for asking :).

> Longer answer:
>
> 'test' can accept both switches (i.e. "-e") and non-switch arguments.
> Keep in mind, too, that all the quoting is stripped by the shell
> _before_ 'test' ever sees its arguments. Let's say that the caller has
> a filename whose name actually is "-e" and passes that in as $1. So,
> what does 'test' see?
>
>     test -e = -
>
> Rather than comparing literal string "-e" to literal string "-", it's
> instead (almost) asking if the file named "=" exists; I say "almost"
> because it's actually an error since switch -e only accepts one
> argument, but it's being given two arguments, "=" and "-".
>
> You might say that having a file named "-e" (or similar) is unlikely,
> however, what is not unlikely is a caller passing "-" for
> standard-input as $1. In this case, 'test' sees:
>
>     test - = -
>
> which may or may not be an error in a particular implementation of
> 'test'. Some implementations may understand that "-" is not a valid
> switch, thus infer that you're actually asking for an equality
> comparison between arguments, but other implementations may complain
> either that there is no switch named "-" or that those arguments
> simply make no sense.
>
> This is why it's a very common idiom in shell programming with 'test'
> to see "x" prepended, thus ensuring that the argument can't be
> confused with a switch.

Thanks for a careful and helpful explanation, as always :-). Makes sense
to me.

Thanks,
Taylor
Jeff King Aug. 12, 2020, 3:37 p.m. UTC | #7
On Tue, Aug 11, 2020 at 03:25:03PM -0400, Eric Sunshine wrote:

> 'test' can accept both switches (i.e. "-e") and non-switch arguments.
> Keep in mind, too, that all the quoting is stripped by the shell
> _before_ 'test' ever sees its arguments. Let's say that the caller has
> a filename whose name actually is "-e" and passes that in as $1. So,
> what does 'test' see?
> 
>     test -e = -
> 
> Rather than comparing literal string "-e" to literal string "-", it's
> instead (almost) asking if the file named "=" exists; I say "almost"
> because it's actually an error since switch -e only accepts one
> argument, but it's being given two arguments, "=" and "-".

I don't think this is an error. The program can tell which you meant by
the number of arguments. POSIX lays out some rules here (from "man
1posix test" on my system, but I'm sure you can find it online):

  3 arguments:
    *  If $2 is a binary primary, perform the binary test of $1 and $3.

    *  If $1 is '!', negate the two-argument test of $2 and $3.

    *  If $1 is '(' and $3 is ')', perform the unary test of $2.  On
       systems that do not support the XSI option, the results are
       unspecified if $1 is '(' and $3 is ')'.

    *  Otherwise, produce unspecified results.

So we'd see that "=" is a binary primary (the complete set is defined
earlier). Likewise "! -e -" would hit the second rule. We wouldn't get
fooled by trying to compare the string "!" because it knows that "=" is
a binary operator and "-e" is a unary operator.

It gets weird with "-a" joining expressions. There's some discussion in
the Rationale section of the posix page, and it even warns explicitly
against "-a" (in favor of "test expr1 && test expr1").

> which may or may not be an error in a particular implementation of
> 'test'. Some implementations may understand that "-" is not a valid
> switch, thus infer that you're actually asking for an equality
> comparison between arguments, but other implementations may complain
> either that there is no switch named "-" or that those arguments
> simply make no sense.

Yeah, historically I think there were shells that were not quite so
clever. I have no idea which ones, or whether any are still around. I
don't think we've generally been that concerned with this case in Git,
and nobody has complained. I'd be totally unsurprised to hear that SunOS
/bin/sh doesn't get this right, but we've already written it off as
unusable (it doesn't even support $() expansion).

-Peff
Eric Sunshine Aug. 12, 2020, 4:15 p.m. UTC | #8
On Wed, Aug 12, 2020 at 11:37 AM Jeff King <peff@peff.net> wrote:
> On Tue, Aug 11, 2020 at 03:25:03PM -0400, Eric Sunshine wrote:
> >     test -e = -
> > Rather than comparing literal string "-e" to literal string "-", it's
> > instead (almost) asking if the file named "=" exists; I say "almost"
> > because it's actually an error since switch -e only accepts one
> > argument, but it's being given two arguments, "=" and "-".
>
> I don't think this is an error. The program can tell which you meant by
> the number of arguments. POSIX lays out some rules here (from "man
> 1posix test" on my system, but I'm sure you can find it online):

I intentionally didn't focus on or mention POSIX in my response
because I wanted to represent the Real World reason why "x$var" is
such a common idiom. As a person who regularly uses ancient operating
systems and old computers (it's common for me to be stuck on computers
10-25 years old; my most up-to-date computer is 9.5 years old), Real
World is something I focus on more than POSIX. Okay, I may be an
outlier, but still...

> > which may or may not be an error in a particular implementation of
> > 'test'. Some implementations may understand that "-" is not a valid
> > switch, thus infer that you're actually asking for an equality
> > comparison between arguments, but other implementations may complain
> > either that there is no switch named "-" or that those arguments
> > simply make no sense.
>
> Yeah, historically I think there were shells that were not quite so
> clever. I have no idea which ones, or whether any are still around. I
> don't think we've generally been that concerned with this case in Git,
> and nobody has complained. I'd be totally unsurprised to hear that SunOS
> /bin/sh doesn't get this right, but we've already written it off as
> unusable (it doesn't even support $() expansion).

I'm probably showing my age, but having had to deal with the many
inconsistencies of implementation over the years, it's hard to _not_
worry about such things even now. Adding the "x" to "$1" or to making
other minor portability concessions is reflex at this point. These
portability concerns are also always on mind when dealing with Mac OS
since so many of its utilities are ancient and retain the old
behaviors.
Eric Sunshine Aug. 12, 2020, 4:39 p.m. UTC | #9
On Wed, Aug 12, 2020 at 12:15 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Wed, Aug 12, 2020 at 11:37 AM Jeff King <peff@peff.net> wrote:
> > I don't think this is an error. The program can tell which you meant by
> > the number of arguments. POSIX lays out some rules here (from "man
> > 1posix test" on my system, but I'm sure you can find it online):
>
> I intentionally didn't focus on or mention POSIX in my response
> because I wanted to represent the Real World reason why "x$var" is
> such a common idiom. [...]

I probably should have done a better job in my original response to
Taylor to make it clear that I was talking about Real World (even if
old) rather than POSIX.
Jeff King Aug. 12, 2020, 5:10 p.m. UTC | #10
On Wed, Aug 12, 2020 at 12:39:14PM -0400, Eric Sunshine wrote:

> On Wed, Aug 12, 2020 at 12:15 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > On Wed, Aug 12, 2020 at 11:37 AM Jeff King <peff@peff.net> wrote:
> > > I don't think this is an error. The program can tell which you meant by
> > > the number of arguments. POSIX lays out some rules here (from "man
> > > 1posix test" on my system, but I'm sure you can find it online):
> >
> > I intentionally didn't focus on or mention POSIX in my response
> > because I wanted to represent the Real World reason why "x$var" is
> > such a common idiom. [...]
> 
> I probably should have done a better job in my original response to
> Taylor to make it clear that I was talking about Real World (even if
> old) rather than POSIX.

Yeah. I guess I'm questioning how current that Real World view is. It
hasn't bitten us yet, though we do seem to do the "x" thing in some
places. And most of our shell code is in the test suite, which sees
pretty tame filenames.

-Peff

Patch
diff mbox series

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index b791933ffd..a12d6a3fc9 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -952,7 +952,13 @@  test_expect_code () {
 # - not all diff versions understand "-u"
 
 test_cmp() {
-	eval "$GIT_TEST_CMP" '"$@"'
+	test $# -eq 2 || BUG "test_cmp requires two arguments"
+	if ! eval "$GIT_TEST_CMP" '"$@"'
+	then
+		test "x$1" = x- || test -e "$1" || BUG "test_cmp '$1' missing"
+		test "x$2" = x- || test -e "$2" || BUG "test_cmp '$2' missing"
+		return 1
+	fi
 }
 
 # Check that the given config key has the expected value.
@@ -981,7 +987,13 @@  test_cmp_config() {
 # test_cmp_bin - helper to compare binary files
 
 test_cmp_bin() {
-	cmp "$@"
+	test $# -eq 2 || BUG "test_cmp_bin requires two arguments"
+	if ! cmp "$@"
+	then
+		test "x$1" = x- || test -e "$1" || BUG "test_cmp_bin '$1' missing"
+		test "x$2" = x- || test -e "$2" || BUG "test_cmp_bin '$2' missing"
+		return 1
+	fi
 }
 
 # Use this instead of test_cmp to compare files that contain expected and