diff mbox series

diff: handle negative status in diff_result_code()

Message ID 20230821003532.GA1113755@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series diff: handle negative status in diff_result_code() | expand

Commit Message

Jeff King Aug. 21, 2023, 12:35 a.m. UTC
On Sun, Aug 20, 2023 at 08:52:36PM +0100, Romain Chossart wrote:

> I recently found out (the hard way :-) ) that running the following
> command prints an appropriate error on stderr but returns a zero exit
> code:
> 
> > $ git diff --no-pager --exit-code
> > error: invalid option: --no-pager
> > $ echo $?
> > 0
> 
> I would expect a non-zero exit code to be returned.

Yikes, I would expect that, too.

> Interestingly, running `git diff --no-pager --exit-code HEAD` shows a
> usage instead and does return a non-zero exit code as expected.

The difference has to do with where the parsing occurs. Because "git
diff" has so many sub-modes, without the "HEAD" it passes options down
to a helper function, where we first notice the invalid option and
return an error. So the real bug here is how we handle errors with
exit-code, and extends beyond just invalid options.

Fix is below. AFAICT the bug has been around forever (I didn't check
earlier than v1.6.6, which shows it). So it is not new in the upcoming
v2.42, and we can probably apply it to the 'maint' track after the
release.

Thanks for your report. And I'm impressed you managed to find such an
ancient bug. :)

-- >8 --
Subject: [PATCH] diff: handle negative status in diff_result_code()

Most programs which run a diff (porcelain git-diff, diff-tree, etc) run
their result through diff_result_code() before presenting it to the user
as a program return code. That result generally comes from a library
function like builtin_diff_files(), etc, and may be "-1" if the function
bailed with an error.

There are two problems here:

  - if --exit-code is not in use, then we pass the code along as-is.
    Even though Unix exit codes are unsigned, this usually works OK
    because the integer representation, and "-1" ends up as "255". But
    it's not something we should rely on, and we've tried to avoid it
    elsewhere. E.g. in 5391e94813 (branch: remove negative exit code,
    2022-03-29) and 246f0edec0 (execv_dashed_external: stop exiting with
    negative code, 2017-01-06) and probably others.

  - when --exit-code is in use, we ignore the incoming "status" variable
    entirely, and rely on the "has_changes" field. But if we saw an
    error, this field is almost certainly invalid, and means we'd likely
    just say "no changes", which is completely bogus. Likewise for
    the "--check" format.

So let's intercept the negative value here and return an appropriate
code before even considering --exit-code, etc. And while doing so, we
can swap out the negative value for a more normal exit code.

The test here uses an invalid option to trigger the error condition,
since that's easy and reliable. But this could also happen, for example,
if we failed to load the index (in which case we'd just report "no
changes" with --exit-code, which is very wrong).

Reported-by: Romain Chossart <romainchossart@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
I know we talked recently about not matching "128" explicitly in tests,
but it seems like the least-bad thing here, as explained in the comment.

 diff.c                 | 11 +++++++++++
 t/t4017-diff-retval.sh | 10 ++++++++++
 2 files changed, 21 insertions(+)

Comments

Junio C Hamano Aug. 21, 2023, 3:56 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> Thanks for your report. And I'm impressed you managed to find such an
> ancient bug. :)

Indeed.  Thanks, both.

> Most programs which run a diff (porcelain git-diff, diff-tree, etc) run
> their result through diff_result_code() before presenting it to the user
> as a program return code. That result generally comes from a library
> function like builtin_diff_files(), etc, and may be "-1" if the function
> bailed with an error.
>
>
> There are two problems here:
>
>   - if --exit-code is not in use, then we pass the code along as-is.
>     Even though Unix exit codes are unsigned, this usually works OK
>     because the integer representation, and "-1" ends up as "255". But
>     it's not something we should rely on, and we've tried to avoid it
>     elsewhere. E.g. in 5391e94813 (branch: remove negative exit code,
>     2022-03-29) and 246f0edec0 (execv_dashed_external: stop exiting with
>     negative code, 2017-01-06) and probably others.
>
>   - when --exit-code is in use, we ignore the incoming "status" variable
>     entirely, and rely on the "has_changes" field. But if we saw an
>     error, this field is almost certainly invalid, and means we'd likely
>     just say "no changes", which is completely bogus. Likewise for
>     the "--check" format.

Inspecting some callers of diff_result_code() further finds
curiosities.  wt-status.c for example feeds results form
run_diff_index() and run_diff_files() to the function, neither of
which returns anything other than 0. They die or exit(128)
themselves, though, so they are OK without this fix.
builtin/describe.c is also in the same boat with its use of
run_diff_index().

But of course the presense of such codepaths does not invalidate the
fix in your patch.

> So let's intercept the negative value here and return an appropriate
> code before even considering --exit-code, etc. And while doing so, we
> can swap out the negative value for a more normal exit code.

Good.  As you said, this is not an ungent regression fix, but I'd
prefer to see us look at all the current callers, as I wonder if
there are positive non-zero numbers coming from some callers, and
possibly design how this code should react to such "result" coming
in.

Again, thanks, both.
Jeff King Aug. 21, 2023, 6:09 p.m. UTC | #2
On Mon, Aug 21, 2023 at 08:56:26AM -0700, Junio C Hamano wrote:

> >   - when --exit-code is in use, we ignore the incoming "status" variable
> >     entirely, and rely on the "has_changes" field. But if we saw an
> >     error, this field is almost certainly invalid, and means we'd likely
> >     just say "no changes", which is completely bogus. Likewise for
> >     the "--check" format.
> 
> Inspecting some callers of diff_result_code() further finds
> curiosities.  wt-status.c for example feeds results form
> run_diff_index() and run_diff_files() to the function, neither of
> which returns anything other than 0. They die or exit(128)
> themselves, though, so they are OK without this fix.
> builtin/describe.c is also in the same boat with its use of
> run_diff_index().
> 
> But of course the presense of such codepaths does not invalidate the
> fix in your patch.

Yeah, I admit I did not dig too much into the callers, as it was obvious
that any negative values were currently being mis-handled, and I knew at
least one caller passed them in (the one in builtin/diff.c).

And in fact I think that is the only problematic caller. Every other
caller either passes the result of run_diff_*(), which always return 0,
or just directly pass zero themselves!

It is only builtin/diff.c that calls its local builtin_diff_blobs(), and
so on, and those may return errors. So one direction is something like:

  - convert those calls to die() more directly; this has the potential
    side effect of producing a better message for the case that started
    this thread by calling usage(builtin_diff_usage) instead of a short
    error

  - drop the now-useless return codes from those functions, along with
    the already-useless ones from run_diff_files(), etc. At this point
    everybody will just be passing a blind "0" to diff_result_code()

  - drop the "status" parameter to diff_result_code()

That would make the code simpler. It does feel a bit like going in the
opposite direction of recent "pass errors up the stack rather than
dying" libification efforts. I think that's OK for the builtin_* helpers
in diff.c, which are just serving the diff porcelain. But things like
run_diff_files(), while pretty big operations, are something we might
call as small part of another operation (like git-describe).

We are not making things worse there, in the sense that their return
codes are currently meaningless. But removing the code entirely feels
like a step in the wrong direction. I dunno. Maybe we should retain
their return codes and have the callers die() if they fail (which would
currently be dead-code, but future-proof us for later cleanups).

-Peff
Junio C Hamano Aug. 21, 2023, 6:39 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> That would make the code simpler. It does feel a bit like going in the
> opposite direction of recent "pass errors up the stack rather than
> dying" libification efforts. I think that's OK for the builtin_* helpers
> in diff.c, which are just serving the diff porcelain. But things like
> run_diff_files(), while pretty big operations, are something we might
> call as small part of another operation (like git-describe).

True, for things in diff-lib.c we likely would want to go in the
opposite "return an error to be handled by the caller" route.
diff mbox series

Patch

diff --git a/diff.c b/diff.c
index ee3eb629e3..b8cd829b1f 100644
--- a/diff.c
+++ b/diff.c
@@ -6980,6 +6980,17 @@  int diff_result_code(struct diff_options *opt, int status)
 	diff_warn_rename_limit("diff.renameLimit",
 			       opt->needed_rename_limit,
 			       opt->degraded_cc_to_c);
+
+	/*
+	 * A negative status indicates an internal error in the diff routines.
+	 * We want to avoid codes like "1" or "2" here that have a documented
+	 * meaning (and obviously not "0" for success / no diff). But we also
+	 * want to avoid negative values, since the result is passed out via
+	 * exit(). Convert them all to 128, which matches die().
+	 */
+	if (status < 0)
+		return 128;
+
 	if (!opt->flags.exit_with_status &&
 	    !(opt->output_format & DIFF_FORMAT_CHECKDIFF))
 		return status;
diff --git a/t/t4017-diff-retval.sh b/t/t4017-diff-retval.sh
index 5bc28ad9f0..0278364fcd 100755
--- a/t/t4017-diff-retval.sh
+++ b/t/t4017-diff-retval.sh
@@ -138,4 +138,14 @@  test_expect_success 'check honors conflict marker length' '
 	git reset --hard
 '
 
+test_expect_success 'errors are not confused by --exit-code' '
+	# The exact code here is not important and not documented. But
+	# we do want to make sure that it is not a meaningful documented
+	# value like "1", nor the result of whatever platform-specific cruft
+	# might result from a negative value. The easiest way to check
+	# that is just to match what we know the code does now. But it would be
+	# OK to change this to match if the code changes.
+	test_expect_code 128 git diff --nonsense --exit-code
+'
+
 test_done