diff mbox series

diff: tighten interaction between -w and --exit-code

Message ID xmqqv8ded018.fsf@gitster.g (mailing list archive)
State New, archived
Headers show
Series diff: tighten interaction between -w and --exit-code | expand

Commit Message

Junio C Hamano Aug. 16, 2023, 11:45 p.m. UTC
The "diff" family of commands have "--exit-code" option to make them
report if they "saw" changes with their exit code, but it did not work
when options in the "ignore space" family is used *and* the output mode
did not involve the "--patch" format output, unless the output is
totally suppressed with "-s".

Internally, the diff machinery grabs sets of paths from the two
sides being compared that record different blob objects, optionally
matches these paths up for rename processing, and compares the blob
objects pair-wise.  When options like "-w" are *not* involved, the
machinery can say that it see a difference immediately after
noticing that the object names of the blobs at paths being compared
are different.  In essence, this means that a non-empty diff_queue[]
means the command should exit with 1 under "--exit-code" mode.

But when options that may make two different blobs compare as
equivalents, like "-w" that ignores whitespace differences, are in
effect, the blobs need to be compared for their contents, as if we
were generating a patch, even when the user is only interested in
"--exit-code" and not an actual differences.

There are two special case tweaks in the existing code.  One is that
the codepath to compute "--patch" output sets .found_changes bit
only when it sees a "real" change (whose definition is loosened when
"-w" is in effect to ignore whitespace-only changes), and uses that
bit, instead of whether diff_queue[] has any paths, to decide the
exit status.  The other is that when "-s" (no output) is combined
with "-w", the machinery calls the same "--patch" codepath but
redirecting the output to void, only for the .found_changes bit.

That is fine, until somebody comes and tries to combine options like
"--stat", "--name-only", "--name-status", etc. with "-w".  Because
the second special case above to run a fallback "--patch" computation
does not kick in for these other output modes, .found_changes bit is
not updated correctly.

To fix this, do two things:

 * The codepath to generate "--stat" output already calls the
   underlying xdiff machinery with appropriate options like "-w",
   but it did not update .found_changes bit.  Fixing "--stat -w"
   combined with "--exit-code" thus becomes just the matter of
   adding these missing .found_changes assignment.

 * For generating "--name-only", "--name-status", etc., the code
   does not look into the contents of the blob objects at all.  For
   now, extend the special case used for "-s -w --exit-code" to run
   a silent "--patch" computation to set the .found_changes bit
   correctly.

Not that the latter may still not be correct in that a path whose
contents have no differences other than whitespace changes would
still show up in the "diff -w --name-only --exit-code" output, even
though the exit status may say there is no differences.  Arguably
this is better than status quo, even though it still is wrong.

Reported-by: Paul Watson <pwatson2@wellmed.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Junio C Hamano <gitster@pobox.com> writes:

> Also, this is not limited to the no-index mode.
>
>     $ echo one >1
>     $ git add 1
>     $ echo two >1
>     $ git diff --exit-code --numstat 1; echo "<<$?>>"
>     1	1	1
>     <<1>>
>     $ git diff --exit-code --numstat -w 1; echo "<<$?>>"
>     1	1	1
>     <<0>>
>
> So the minimum reproduction seems to be
>
>   * the diff machinery is asked to do --exit-code (no-index
>     implicitly does it)
>   * -w is used
>   * -p is *not* used
>   * to compare two different files.
>
> Thanks for a bug report.
>
> Patches welcome ;-)

So I ended up looking into this myself X-<.

 diff.c                     |  22 +++++++++-
 t/t4015-diff-whitespace.sh | 100 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 119 insertions(+), 3 deletions(-)

Comments

Jeff King Aug. 17, 2023, 5:10 a.m. UTC | #1
On Wed, Aug 16, 2023 at 04:45:23PM -0700, Junio C Hamano wrote:

> To fix this, do two things:
> 
>  * The codepath to generate "--stat" output already calls the
>    underlying xdiff machinery with appropriate options like "-w",
>    but it did not update .found_changes bit.  Fixing "--stat -w"
>    combined with "--exit-code" thus becomes just the matter of
>    adding these missing .found_changes assignment.
> 
>  * For generating "--name-only", "--name-status", etc., the code
>    does not look into the contents of the blob objects at all.  For
>    now, extend the special case used for "-s -w --exit-code" to run
>    a silent "--patch" computation to set the .found_changes bit
>    correctly.

Nicely explained overall, but one hunk of the patch left me wondering...

> @@ -3828,6 +3830,11 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
>  				  diffstat_consume, diffstat, &xpp, &xecfg))
>  			die("unable to generate diffstat for %s", one->path);
>  
> +		/* Do this before cancelling the no-op diffstat below */
> +		if (diffstat->files[diffstat->nr - 1]->added ||
> +		    diffstat->files[diffstat->nr - 1]->deleted)
> +			o->found_changes = 1;
> +

So this is checking whether any lines were added/deleted to see if the
stat was a noop. But what about non-content bits, like mode changes?
E.g., the tests below all fail (but would pass without "-w"):

diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 1b944d77e4..54e56ad911 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -25,6 +25,14 @@ test_expect_success 'exit status with -w --name-only (different but equivalent)'
 	git diff -w --name-only --exit-code x
 '
 
+test_expect_success 'exit status with -w --name-only (mode change)' '
+	test_when_finished "git reset" &&
+	echo foo >x &&
+	git add x &&
+	git update-index --chmod=+x x &&
+	test_expect_code 1 git diff -w --name-only --exit-code
+'
+
 test_expect_success 'exit status with -w --raw (different)' '
 	echo foo >x &&
 	git add x &&
@@ -39,6 +47,14 @@ test_expect_success 'exit status with -w --raw (different but equivalent)' '
 	git diff -w --raw --exit-code x
 '
 
+test_expect_success 'exit status with -w --raw (mode change)' '
+	test_when_finished "git reset" &&
+	echo foo >x &&
+	git add x &&
+	git update-index --chmod=+x x &&
+	test_expect_code 1 git diff -w --raw --exit-code
+'
+
 test_expect_success 'exit status with -w --stat (different)' '
 	echo foo >x &&
 	git add x &&
@@ -53,6 +69,14 @@ test_expect_success 'exit status with -w --stat (different but equivalent)' '
 	git diff -w --stat --exit-code x
 '
 
+test_expect_success 'exit status with -w --stat (mode change)' '
+	test_when_finished "git reset" &&
+	echo foo >x &&
+	git add x &&
+	git update-index --chmod=+x x &&
+	test_expect_code 1 git diff -w --stat --exit-code
+'
+
 test_expect_success 'exit status with -w --shortstat (different)' '
 	echo foo >x &&
 	git add x &&
@@ -67,6 +91,14 @@ test_expect_success 'exit status with -w --shortstat (different but equivalent)'
 	git diff -w --shortstat --exit-code x
 '
 
+test_expect_success 'exit status with -w --shortstat (mode change)' '
+	test_when_finished "git reset" &&
+	echo foo >x &&
+	git add x &&
+	git update-index --chmod=+x x &&
+	test_expect_code 1 git diff -w --shortstat --exit-code
+'
+
 test_expect_success 'exit status with -w --quiet (different)' '
 	echo foo >x &&
 	git add x &&
@@ -81,6 +113,14 @@ test_expect_success 'exit status with -w --quiet (different but equivalent)' '
 	git diff -w --quiet --exit-code x
 '
 
+test_expect_success 'exit status with -w --quiet (mode change)' '
+	test_when_finished "git reset" &&
+	echo foo >x &&
+	git add x &&
+	git update-index --chmod=+x x &&
+	test_expect_code 1 git diff -w --quiet --exit-code
+'
+
 test_expect_success 'exit status with -w --summary (different)' '
 	echo foo >x &&
 	git add x &&
@@ -95,6 +135,14 @@ test_expect_success 'exit status with -w --summary (different but equivalent)' '
 	git diff -w --summary --exit-code x
 '
 
+test_expect_success 'exit status with -w --summary (mode change)' '
+	test_when_finished "git reset" &&
+	echo foo >x &&
+	git add x &&
+	git update-index --chmod=+x x &&
+	test_expect_code 1 git diff -w --summary --exit-code
+'
+
 test_expect_success 'exit status with -w -s (different)' '
 	echo foo >x &&
 	git add x &&
@@ -109,6 +157,14 @@ test_expect_success 'exit status with -w -s (different but equivalent)' '
 	git diff -w -s --exit-code x
 '
 
+test_expect_success 'exit status with -w -s (mode change)' '
+	test_when_finished "git reset" &&
+	echo foo >x &&
+	git add x &&
+	git update-index --chmod=+x x &&
+	test_expect_code 1 git diff -w -s --exit-code x
+'
+
 test_expect_success "Ray Lehtiniemi's example" '
 	cat <<-\EOF >x &&
 	do {

For the diffstat case, I think we could check the mode here, but there
are other cases (e.g., adding or deleting an empty file). The code right
below the hunk I quoted seems to try to deal with that (the "cancelling
the no-op" your comment mentions). I'm not sure if we want something
like this:

diff --git a/diff.c b/diff.c
index 38b57b589f..1dbfdaeff0 100644
--- a/diff.c
+++ b/diff.c
@@ -3853,6 +3853,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 				&& one->mode == two->mode) {
 				free_diffstat_file(file);
 				diffstat->nr--;
+			} else {
+				o->found_changes = 1;
 			}
 		}
 	}

but I haven't dug too far (and of course all of the other options also
need something similar to catch this case).

-Peff
Junio C Hamano Aug. 17, 2023, 4:12 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> For the diffstat case, I think we could check the mode here, but there
> are other cases (e.g., adding or deleting an empty file). The code right
> below the hunk I quoted seems to try to deal with that (the "cancelling
> the no-op" your comment mentions). I'm not sure if we want something
> like this:
>
> diff --git a/diff.c b/diff.c
> index 38b57b589f..1dbfdaeff0 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -3853,6 +3853,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
>  				&& one->mode == two->mode) {
>  				free_diffstat_file(file);
>  				diffstat->nr--;
> +			} else {
> +				o->found_changes = 1;
>  			}
>  		}
>  	}

That is much better.  In all cases where the above diffstat->nr-- is
not reached and diffstat is kept is where we found changes, so an
even simpler solution that fundamentally cannot go wrong would be to
see "diffstat->nr" at the end (i.e. "are we going to show diffstat
for *any* filepair?").  If it is non-zero, we did find a difference.

Then we do not have to wonder if that else clause is in the right
place, or we have to do something similar to the above for cases
where DIFF_FILE_VALID() is not true for both sides (i.e. creation or
deletion).

Thanks.
Jeff King Aug. 17, 2023, 7:49 p.m. UTC | #3
On Thu, Aug 17, 2023 at 09:12:09AM -0700, Junio C Hamano wrote:

> > diff --git a/diff.c b/diff.c
> > index 38b57b589f..1dbfdaeff0 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -3853,6 +3853,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
> >  				&& one->mode == two->mode) {
> >  				free_diffstat_file(file);
> >  				diffstat->nr--;
> > +			} else {
> > +				o->found_changes = 1;
> >  			}
> >  		}
> >  	}
> 
> That is much better.  In all cases where the above diffstat->nr-- is
> not reached and diffstat is kept is where we found changes, so an
> even simpler solution that fundamentally cannot go wrong would be to
> see "diffstat->nr" at the end (i.e. "are we going to show diffstat
> for *any* filepair?").  If it is non-zero, we did find a difference.

Yeah, without having really dug into the problem too far, that does
sound a lot better. I also wonder to what degree you could apply the
same strategy to other formats (I guess it depends on them removing
whitespace-only changes from a structure). From the test I posted
earlier, it does look like many of them have the same blind spots for
mode-only changes (and I suspect addition/removal of empty files is
another corner case to check).

-Peff
Junio C Hamano Aug. 17, 2023, 7:56 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

> Yeah, without having really dug into the problem too far, that does
> sound a lot better. I also wonder to what degree you could apply the
> same strategy to other formats (I guess it depends on them removing
> whitespace-only changes from a structure). From the test I posted
> earlier, it does look like many of them have the same blind spots for
> mode-only changes (and I suspect addition/removal of empty files is
> another corner case to check).

I have something cooking.  Stay tuned, without getting excited too
much ;-)

Thanks.
diff mbox series

Patch

diff --git c/diff.c w/diff.c
index ee3eb629e3..38b57b589f 100644
--- c/diff.c
+++ w/diff.c
@@ -3795,6 +3795,7 @@  static void builtin_diffstat(const char *name_a, const char *name_b,
 		} else {
 			data->added = diff_filespec_size(o->repo, two);
 			data->deleted = diff_filespec_size(o->repo, one);
+			o->found_changes = 1;
 		}
 	}
 
@@ -3803,6 +3804,7 @@  static void builtin_diffstat(const char *name_a, const char *name_b,
 		diff_populate_filespec(o->repo, two, NULL);
 		data->deleted = count_lines(one->data, one->size);
 		data->added = count_lines(two->data, two->size);
+		o->found_changes = 1;
 	}
 
 	else if (may_differ) {
@@ -3828,6 +3830,11 @@  static void builtin_diffstat(const char *name_a, const char *name_b,
 				  diffstat_consume, diffstat, &xpp, &xecfg))
 			die("unable to generate diffstat for %s", one->path);
 
+		/* Do this before cancelling the no-op diffstat below */
+		if (diffstat->files[diffstat->nr - 1]->added ||
+		    diffstat->files[diffstat->nr - 1]->deleted)
+			o->found_changes = 1;
+
 		if (DIFF_FILE_VALID(one) && DIFF_FILE_VALID(two)) {
 			struct diffstat_file *file =
 				diffstat->files[diffstat->nr - 1];
@@ -4832,6 +4839,11 @@  void diff_setup_done(struct diff_options *options)
 	else
 		options->prefix_length = 0;
 
+	/*
+	 * This is how "diff --name-status -p --stat --raw" becomes
+	 * equivalent to "diff --name-status", which may be
+	 * unintuitive.
+	 */
 	if (options->output_format & (DIFF_FORMAT_NAME |
 				      DIFF_FORMAT_NAME_STATUS |
 				      DIFF_FORMAT_CHECKDIFF |
@@ -6684,13 +6696,19 @@  void diff_flush(struct diff_options *options)
 		separator++;
 	}
 
-	if (output_format & DIFF_FORMAT_NO_OUTPUT &&
+	if (((output_format & DIFF_FORMAT_NO_OUTPUT) ||
+	     /* these compute .found_changes properly */
+	     !(output_format & (DIFF_FORMAT_DIFFSTAT|
+				DIFF_FORMAT_SHORTSTAT|
+				DIFF_FORMAT_NUMSTAT|
+				DIFF_FORMAT_DIRSTAT|
+				DIFF_FORMAT_PATCH))) &&
 	    options->flags.exit_with_status &&
 	    options->flags.diff_from_contents) {
 		/*
 		 * run diff_flush_patch for the exit status. setting
 		 * options->file to /dev/null should be safe, because we
-		 * aren't supposed to produce any output anyway.
+		 * aren't supposed to produce any output from here.
 		 */
 		diff_free_file(options);
 		options->file = xfopen("/dev/null", "w");
diff --git c/t/t4015-diff-whitespace.sh w/t/t4015-diff-whitespace.sh
index b298f220e0..1b944d77e4 100755
--- c/t/t4015-diff-whitespace.sh
+++ w/t/t4015-diff-whitespace.sh
@@ -1,7 +1,7 @@ 
 #!/bin/sh
 #
 # Copyright (c) 2006 Johannes E. Schindelin
-#
+# Copyright (c) 2023 Google LLC
 
 test_description='Test special whitespace in diff engine.
 
@@ -11,6 +11,104 @@  TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-diff.sh
 
+test_expect_success 'exit status with -w --name-only (different)' '
+	echo foo >x &&
+	git add x &&
+	echo bar >x &&
+	test_expect_code 1 git diff -w --name-only --exit-code x
+'
+
+test_expect_success 'exit status with -w --name-only (different but equivalent)' '
+	echo foo >x &&
+	git add x &&
+	echo " foo" >x &&
+	git diff -w --name-only --exit-code x
+'
+
+test_expect_success 'exit status with -w --raw (different)' '
+	echo foo >x &&
+	git add x &&
+	echo bar >x &&
+	test_expect_code 1 git diff -w --raw --exit-code x
+'
+
+test_expect_success 'exit status with -w --raw (different but equivalent)' '
+	echo foo >x &&
+	git add x &&
+	echo " foo" >x &&
+	git diff -w --raw --exit-code x
+'
+
+test_expect_success 'exit status with -w --stat (different)' '
+	echo foo >x &&
+	git add x &&
+	echo bar >x &&
+	test_expect_code 1 git diff -w --stat --exit-code x
+'
+
+test_expect_success 'exit status with -w --stat (different but equivalent)' '
+	echo foo >x &&
+	git add x &&
+	echo " foo" >x &&
+	git diff -w --stat --exit-code x
+'
+
+test_expect_success 'exit status with -w --shortstat (different)' '
+	echo foo >x &&
+	git add x &&
+	echo bar >x &&
+	test_expect_code 1 git diff -w --shortstat --exit-code x
+'
+
+test_expect_success 'exit status with -w --shortstat (different but equivalent)' '
+	echo foo >x &&
+	git add x &&
+	echo " foo" >x &&
+	git diff -w --shortstat --exit-code x
+'
+
+test_expect_success 'exit status with -w --quiet (different)' '
+	echo foo >x &&
+	git add x &&
+	echo bar >x &&
+	test_expect_code 1 git diff -w --quiet --exit-code x
+'
+
+test_expect_success 'exit status with -w --quiet (different but equivalent)' '
+	echo foo >x &&
+	git add x &&
+	echo " foo" >x &&
+	git diff -w --quiet --exit-code x
+'
+
+test_expect_success 'exit status with -w --summary (different)' '
+	echo foo >x &&
+	git add x &&
+	echo bar >x &&
+	test_expect_code 1 git diff -w --summary --exit-code x
+'
+
+test_expect_success 'exit status with -w --summary (different but equivalent)' '
+	echo foo >x &&
+	git add x &&
+	echo " foo" >x &&
+	git diff -w --summary --exit-code x
+'
+
+test_expect_success 'exit status with -w -s (different)' '
+	echo foo >x &&
+	git add x &&
+	echo bar >x &&
+	test_expect_code 1 git diff -w -s --exit-code x
+'
+
+test_expect_success 'exit status with -w -s (different but equivalent)' '
+	echo foo >x &&
+	git add x &&
+	echo " foo" >x &&
+	git diff -w -s --exit-code x
+'
+
 test_expect_success "Ray Lehtiniemi's example" '
 	cat <<-\EOF >x &&
 	do {