diff mbox series

[2/2] diff: fix --exit-code with external diff

Message ID e2e4a4e9-55db-403c-902d-fd8af3aea05c@web.de (mailing list archive)
State New
Headers show
Series [1/2] diff: report unmerged paths as changes in run_diff_cmd() | expand

Commit Message

René Scharfe May 5, 2024, 10:20 a.m. UTC
You can ask the diff machinery to let the exit code indicate whether
there are changes, e.g. with --exit-code.  It as two ways to calculate
that bit: The quick one assumes blobs with different hashes have
different content, and the more elaborate way actually compares the
contents, possibly applying transformations like ignoring whitespace.

Always use the slower path by setting the flag diff_from_contents,
because any of the files could have an external diff driver set via an
attribute, which might consider binary differences irrelevant, like e.g.
--
2.45.0

Comments

Phillip Wood May 5, 2024, 3:25 p.m. UTC | #1
Hi René

Thanks for working on this

On 05/05/2024 11:20, René Scharfe wrote:
> Finally, capture the output of the external diff and only register a
> change by setting found_changes if the program wrote anything.

I worry that this will lead to regression reports like 
https://lore.kernel.org/git/CA+dzEBn108QoMA28f0nC8K21XT+Afua0V2Qv8XkR8rAeqUCCZw@mail.gmail.com/ 
as the external diff will not see a terminal on stdout anymore. I'm not 
really clear why we ignore the exit code of the external diff command. 
Merge strategies are expected to exit 0 on success, 1 when there are 
conflicts and another non-zero value for other errors - it would be nice 
to do something similar here where 1 means "there were differences" but 
it is probably too late to do that without a config value to indicate 
that we should trust the exit code.

Best Wishes

Phillip
Junio C Hamano May 6, 2024, 5:31 p.m. UTC | #2
Phillip Wood <phillip.wood123@gmail.com> writes:

>> Finally, capture the output of the external diff and only register a
>> change by setting found_changes if the program wrote anything.
>
> I worry that this will lead to regression reports like
> https://lore.kernel.org/git/CA+dzEBn108QoMA28f0nC8K21XT+Afua0V2Qv8XkR8rAeqUCCZw@mail.gmail.com/
> as the external diff will not see a terminal on stdout anymore.

Hmph.  If it were Git that the external diff script eventually
invokes that changes the behaviour (e.g. color and use of pager)
based on the tty-ness of the output, we could use tricks like
GIT_PAGER_IN_USE to propagate tty-ness down to the subprocess,
but Git is not the only thing involved here, so it is not a very
good general solution.

> I'm
> not really clear why we ignore the exit code of the external diff
> command. Merge strategies are expected to exit 0 on success, 1 when
> there are conflicts and another non-zero value for other errors - it
> would be nice to do something similar here where 1 means "there were
> differences" but it is probably too late to do that without a config
> value to indicate that we should trust the exit code.

Yeah, that thought crossed my mind.  If we were designing the system
from scratch today, that would certainly be what we would do.  I
suspect that it is because external diff drivers were invented way
before "diff --exit-code" was introduced.

Thanks.
René Scharfe May 6, 2024, 6:23 p.m. UTC | #3
Am 05.05.24 um 17:25 schrieb Phillip Wood:
> On 05/05/2024 11:20, René Scharfe wrote:
>> Finally, capture the output of the external diff and only register a
>> change by setting found_changes if the program wrote anything.
>
> I worry that this will lead to regression reports like
> https://lore.kernel.org/git/CA+dzEBn108QoMA28f0nC8K21XT+Afua0V2Qv8XkR8rAeqUCCZw@mail.gmail.com/
> as the external diff will not see a terminal on stdout anymore.

So external diffs might no longer color their output if invoked using
the option --exit-code.  I assume this can be worked around by adding an
option like --color=always to the diff command.  This warrants a note in
the docs at the very least, though.

But thinking about it, the external diff could be a GUI program that
doesn't write to its stdout at all.  Or it could write something if the
files' contents match and stay silent if there are differences, weird as
that may be.

> I'm not really clear why we ignore the exit code of the external diff
> command.

Historical reasons, I guess.

> Merge strategies are expected to exit 0 on success, 1 when there are
> conflicts and another non-zero value for other errors - it would be
> nice to do something similar here where 1 means "there were
> differences" but it is probably too late to do that without a config
> value to indicate that we should trust the exit code.
Right, such a diff command protocol v2 would not need to pipe the
output through an inspection loop.  Sounds like a good idea.  It's
unfortunate that it would increase the configuration surface, which is
not in an acute need to expand.  We could advertise the new option when
dying due to the unsupported combination of --exit-code and external
diff, but that's in equal parts helpful and obnoxious, I feel.

René
Phillip Wood May 8, 2024, 3:25 p.m. UTC | #4
Hi René

On 06/05/2024 19:23, René Scharfe wrote:
> Am 05.05.24 um 17:25 schrieb Phillip Wood:
>> Merge strategies are expected to exit 0 on success, 1 when there are
>> conflicts and another non-zero value for other errors - it would be
>> nice to do something similar here where 1 means "there were
>> differences" but it is probably too late to do that without a config
>> value to indicate that we should trust the exit code.
> Right, such a diff command protocol v2 would not need to pipe the
> output through an inspection loop.  Sounds like a good idea.  It's
> unfortunate that it would increase the configuration surface, which is
> not in an acute need to expand.  We could advertise the new option when
> dying due to the unsupported combination of --exit-code and external
> diff, but that's in equal parts helpful and obnoxious, I feel.

Yes, diff dying would be annoying but the message would be useful.

Thinking about the external diff and some of the other diff options I 
wonder what we should do when options like "--quiet" and "--name-only" 
are combined with an external diff (I haven't checked the current 
behavior). If we introduced a diff command protocol v2 we could include 
a way to pass through "--quiet" though maybe just redirecting the stdout 
of the external command to /dev/null and using the exit code would be 
sufficient.

Best Wishes

Phillip

P.S. I haven't forgotten about our unit-test discussion but I'm afraid 
it will probably be the middle of next month before I have time to reply.
René Scharfe May 11, 2024, 8:32 p.m. UTC | #5
Am 08.05.24 um 17:25 schrieb phillip.wood123@gmail.com:
> Hi René
>
> On 06/05/2024 19:23, René Scharfe wrote:
>> Am 05.05.24 um 17:25 schrieb Phillip Wood:
>>> Merge strategies are expected to exit 0 on success, 1 when there are
>>> conflicts and another non-zero value for other errors - it would be
>>> nice to do something similar here where 1 means "there were
>>> differences" but it is probably too late to do that without a config
>>> value to indicate that we should trust the exit code.
>> Right, such a diff command protocol v2 would not need to pipe the
>> output through an inspection loop.  Sounds like a good idea.  It's
>> unfortunate that it would increase the configuration surface, which is
>> not in an acute need to expand.  We could advertise the new option when
>> dying due to the unsupported combination of --exit-code and external
>> diff, but that's in equal parts helpful and obnoxious, I feel.
>
> Yes, diff dying would be annoying but the message would be useful.

Having poked at it a bit more, I wonder if we need to add some further
nuance/trick to avoid having to reject certain combinations of options.

Currently external diffs can't signal content equality.  That doesn't
matter for trivial equality (where content and mode of the two files
match), as this case is always handled by the diff machinery already.
Only lossy comparisons (e.g. ignoring whitespace) even have the need to
signal equality.

If we (continue to) assume that external diffs are lossless then we
don't need to change the code, just document that assumption.  And add a
way to specify lossy external diffs that can signal whether they found
interesting differences, to address the originally reported shortcoming.

Is there an official term for comparisons that ignore some details?
"Lossy" is rather used for compression.  Filtered, partial, selective?

> Thinking about the external diff and some of the other diff options I
> wonder what we should do when options like "--quiet" and
> "--name-only" are combined with an external diff (I haven't checked
> the current behavior). If we introduced a diff command protocol v2 we
> could include a way to pass through "--quiet" though maybe just
> redirecting the stdout of the external command to /dev/null and using
> the exit code would be sufficient.

The current code uses shortcuts like that.  For lossy external diffs we
need to turn (some of) them off.  Lots of possible combinations with
special handling -- needs lots of tests for reasonable coverage. :-/

> P.S. I haven't forgotten about our unit-test discussion but I'm
> afraid it will probably be the middle of next month before I have
> time to reply.
No worries; reminds me to polish some unit-test patches, though.  I
get distracted a lot these days..

René
René Scharfe May 12, 2024, 9:38 a.m. UTC | #6
Am 11.05.24 um 22:32 schrieb René Scharfe:
> Am 08.05.24 um 17:25 schrieb phillip.wood123@gmail.com:
>> Hi René
>>
>> On 06/05/2024 19:23, René Scharfe wrote:
>>> Am 05.05.24 um 17:25 schrieb Phillip Wood:
>>>> Merge strategies are expected to exit 0 on success, 1 when there are
>>>> conflicts and another non-zero value for other errors - it would be
>>>> nice to do something similar here where 1 means "there were
>>>> differences" but it is probably too late to do that without a config
>>>> value to indicate that we should trust the exit code.
>>> Right, such a diff command protocol v2 would not need to pipe the
>>> output through an inspection loop.  Sounds like a good idea.  It's
>>> unfortunate that it would increase the configuration surface, which is
>>> not in an acute need to expand.  We could advertise the new option when
>>> dying due to the unsupported combination of --exit-code and external
>>> diff, but that's in equal parts helpful and obnoxious, I feel.
>>
>> Yes, diff dying would be annoying but the message would be useful.
>
> Having poked at it a bit more, I wonder if we need to add some further
> nuance/trick to avoid having to reject certain combinations of options.
>
> Currently external diffs can't signal content equality.  That doesn't
> matter for trivial equality (where content and mode of the two files
> match), as this case is always handled by the diff machinery already.
> Only lossy comparisons (e.g. ignoring whitespace) even have the need to
> signal equality.
>
> If we (continue to) assume that external diffs are lossless then we
> don't need to change the code, just document that assumption.  And add a
> way to specify lossy external diffs that can signal whether they found
> interesting differences, to address the originally reported shortcoming.

One more step in that direction: If we continue to use exit code 0 to
mean "notable changes present" and redefine 1 to mean "non-trivial
equality"  instead of "fatal error" then we don't need to add any config
options.

A downside is that the exit codes of diff(1) have the opposite meaning.
Which is weird in itself: You say "give me a diff" and it answers "true,
I got nothing for you" or "false, I actually had to print that dang
thing".  Inherited from cmp(1), I guess.  Which I further guess got it
from bcmp(3)?

But we can't directly use diff(1) anyway because we pass either one or
six parameters instead of the two that it would expect.  Our external
diff calling protocol is already special enough that we can freely
chose the meaning of exit codes.

Any other downsides?  Am I missing something?

René
diff mbox series

Patch

diff -b.

And don't stop walking that path in diff_flush() just because
output_format is unset.  Instead run diff_flush_patch() with output
redirected to /dev/null to get the exit status, like we already do for
DIFF_FORMAT_NO_OUTPUT.  That's consistent with the comments in diff.h:

   /* Same as output_format = 0 but we know that -s flag was given
    * and we should not give default value to output_format.
    */
   #define DIFF_FORMAT_NO_OUTPUT	0x0800

An unset output_format is given e.g. by repo_index_has_changes().  We
could set it right there, but there may be other places, so simply let
diff_flush() handle it for all of them consistently.

Finally, capture the output of the external diff and only register a
change by setting found_changes if the program wrote anything.

Reported-by: German Lashevich <german.lashevich@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 diff.c                   | 33 ++++++++++++++++++++++++++++++---
 t/t4020-diff-external.sh |  8 ++++++++
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index ded9ac70df..cd3c8a3978 100644
--- a/diff.c
+++ b/diff.c
@@ -40,6 +40,7 @@ 
 #include "setup.h"
 #include "strmap.h"
 #include "ws.h"
+#include "write-or-die.h"

 #ifdef NO_FAST_WORKING_DIRECTORY
 #define FAST_WORKING_DIRECTORY 0
@@ -4404,8 +4405,33 @@  static void run_external_diff(const char *pgm,
 	diff_free_filespec_data(one);
 	diff_free_filespec_data(two);
 	cmd.use_shell = 1;
-	if (run_command(&cmd))
-		die(_("external diff died, stopping at %s"), name);
+	if (o->flags.diff_from_contents) {
+		int got_output = 0;
+		cmd.out = -1;
+		if (start_command(&cmd))
+			die(_("external diff died, stopping at %s"), name);
+		for (;;) {
+			char buffer[8192];
+			ssize_t len = xread(cmd.out, buffer, sizeof(buffer));
+			if (!len)
+				break;
+			if (len < 0)
+				die(_("unable to read from external diff,"
+				      " stopping at %s"), name);
+			got_output = 1;
+			if (write_in_full(1, buffer, len) < 0)
+				die(_("unable to write output of external diff,"
+				      " stopping at %s"), name);
+		}
+		close(cmd.out);
+		if (finish_command(&cmd))
+			die(_("external diff died, stopping at %s"), name);
+		if (got_output)
+			o->found_changes = 1;
+	} else {
+		if (run_command(&cmd))
+			die(_("external diff died, stopping at %s"), name);
+	}

 	remove_tempfile();
 }
@@ -4852,6 +4878,7 @@  void diff_setup_done(struct diff_options *options)
 	 */

 	if ((options->xdl_opts & XDF_WHITESPACE_FLAGS) ||
+	    options->flags.exit_with_status ||
 	    options->ignore_regex_nr)
 		options->flags.diff_from_contents = 1;
 	else
@@ -6742,7 +6769,7 @@  void diff_flush(struct diff_options *options)
 	if (output_format & DIFF_FORMAT_CALLBACK)
 		options->format_callback(q, options, options->format_callback_data);

-	if (output_format & DIFF_FORMAT_NO_OUTPUT &&
+	if ((!output_format || output_format & DIFF_FORMAT_NO_OUTPUT) &&
 	    options->flags.exit_with_status &&
 	    options->flags.diff_from_contents) {
 		/*
diff --git a/t/t4020-diff-external.sh b/t/t4020-diff-external.sh
index fdd865f7c3..26430ca66b 100755
--- a/t/t4020-diff-external.sh
+++ b/t/t4020-diff-external.sh
@@ -172,6 +172,14 @@  test_expect_success 'no diff with -diff' '
 	grep Binary out
 '

+test_expect_success 'diff.external and --exit-code with output' '
+	test_expect_code 1 git -c diff.external=echo diff --exit-code
+'
+
+test_expect_success 'diff.external and --exit-code without output' '
+	git -c diff.external=true diff --exit-code
+'
+
 echo NULZbetweenZwords | perl -pe 'y/Z/\000/' > file

 test_expect_success 'force diff with "diff"' '