Message ID | 20230821003532.GA1113755@coredump.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | diff: handle negative status in diff_result_code() | expand |
Jeff King <peff@peff.net> writes: > Thanks for your report. And I'm impressed you managed to find such an > ancient bug. :) Indeed. Thanks, both. > Most programs which run a diff (porcelain git-diff, diff-tree, etc) run > their result through diff_result_code() before presenting it to the user > as a program return code. That result generally comes from a library > function like builtin_diff_files(), etc, and may be "-1" if the function > bailed with an error. > > > There are two problems here: > > - if --exit-code is not in use, then we pass the code along as-is. > Even though Unix exit codes are unsigned, this usually works OK > because the integer representation, and "-1" ends up as "255". But > it's not something we should rely on, and we've tried to avoid it > elsewhere. E.g. in 5391e94813 (branch: remove negative exit code, > 2022-03-29) and 246f0edec0 (execv_dashed_external: stop exiting with > negative code, 2017-01-06) and probably others. > > - when --exit-code is in use, we ignore the incoming "status" variable > entirely, and rely on the "has_changes" field. But if we saw an > error, this field is almost certainly invalid, and means we'd likely > just say "no changes", which is completely bogus. Likewise for > the "--check" format. Inspecting some callers of diff_result_code() further finds curiosities. wt-status.c for example feeds results form run_diff_index() and run_diff_files() to the function, neither of which returns anything other than 0. They die or exit(128) themselves, though, so they are OK without this fix. builtin/describe.c is also in the same boat with its use of run_diff_index(). But of course the presense of such codepaths does not invalidate the fix in your patch. > So let's intercept the negative value here and return an appropriate > code before even considering --exit-code, etc. And while doing so, we > can swap out the negative value for a more normal exit code. Good. As you said, this is not an ungent regression fix, but I'd prefer to see us look at all the current callers, as I wonder if there are positive non-zero numbers coming from some callers, and possibly design how this code should react to such "result" coming in. Again, thanks, both.
On Mon, Aug 21, 2023 at 08:56:26AM -0700, Junio C Hamano wrote: > > - when --exit-code is in use, we ignore the incoming "status" variable > > entirely, and rely on the "has_changes" field. But if we saw an > > error, this field is almost certainly invalid, and means we'd likely > > just say "no changes", which is completely bogus. Likewise for > > the "--check" format. > > Inspecting some callers of diff_result_code() further finds > curiosities. wt-status.c for example feeds results form > run_diff_index() and run_diff_files() to the function, neither of > which returns anything other than 0. They die or exit(128) > themselves, though, so they are OK without this fix. > builtin/describe.c is also in the same boat with its use of > run_diff_index(). > > But of course the presense of such codepaths does not invalidate the > fix in your patch. Yeah, I admit I did not dig too much into the callers, as it was obvious that any negative values were currently being mis-handled, and I knew at least one caller passed them in (the one in builtin/diff.c). And in fact I think that is the only problematic caller. Every other caller either passes the result of run_diff_*(), which always return 0, or just directly pass zero themselves! It is only builtin/diff.c that calls its local builtin_diff_blobs(), and so on, and those may return errors. So one direction is something like: - convert those calls to die() more directly; this has the potential side effect of producing a better message for the case that started this thread by calling usage(builtin_diff_usage) instead of a short error - drop the now-useless return codes from those functions, along with the already-useless ones from run_diff_files(), etc. At this point everybody will just be passing a blind "0" to diff_result_code() - drop the "status" parameter to diff_result_code() That would make the code simpler. It does feel a bit like going in the opposite direction of recent "pass errors up the stack rather than dying" libification efforts. I think that's OK for the builtin_* helpers in diff.c, which are just serving the diff porcelain. But things like run_diff_files(), while pretty big operations, are something we might call as small part of another operation (like git-describe). We are not making things worse there, in the sense that their return codes are currently meaningless. But removing the code entirely feels like a step in the wrong direction. I dunno. Maybe we should retain their return codes and have the callers die() if they fail (which would currently be dead-code, but future-proof us for later cleanups). -Peff
Jeff King <peff@peff.net> writes: > That would make the code simpler. It does feel a bit like going in the > opposite direction of recent "pass errors up the stack rather than > dying" libification efforts. I think that's OK for the builtin_* helpers > in diff.c, which are just serving the diff porcelain. But things like > run_diff_files(), while pretty big operations, are something we might > call as small part of another operation (like git-describe). True, for things in diff-lib.c we likely would want to go in the opposite "return an error to be handled by the caller" route.
diff --git a/diff.c b/diff.c index ee3eb629e3..b8cd829b1f 100644 --- a/diff.c +++ b/diff.c @@ -6980,6 +6980,17 @@ int diff_result_code(struct diff_options *opt, int status) diff_warn_rename_limit("diff.renameLimit", opt->needed_rename_limit, opt->degraded_cc_to_c); + + /* + * A negative status indicates an internal error in the diff routines. + * We want to avoid codes like "1" or "2" here that have a documented + * meaning (and obviously not "0" for success / no diff). But we also + * want to avoid negative values, since the result is passed out via + * exit(). Convert them all to 128, which matches die(). + */ + if (status < 0) + return 128; + if (!opt->flags.exit_with_status && !(opt->output_format & DIFF_FORMAT_CHECKDIFF)) return status; diff --git a/t/t4017-diff-retval.sh b/t/t4017-diff-retval.sh index 5bc28ad9f0..0278364fcd 100755 --- a/t/t4017-diff-retval.sh +++ b/t/t4017-diff-retval.sh @@ -138,4 +138,14 @@ test_expect_success 'check honors conflict marker length' ' git reset --hard ' +test_expect_success 'errors are not confused by --exit-code' ' + # The exact code here is not important and not documented. But + # we do want to make sure that it is not a meaningful documented + # value like "1", nor the result of whatever platform-specific cruft + # might result from a negative value. The easiest way to check + # that is just to match what we know the code does now. But it would be + # OK to change this to match if the code changes. + test_expect_code 128 git diff --nonsense --exit-code +' + test_done