Message ID | 20230914093948.GA2254894@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 808e83f2667e4b442a8f58f0c7ef55feb6864f65 |
Headers | show |
Series | merge-ort unused parameter cleanups | expand |
On Thu, Sep 14, 2023 at 2:39 AM Jeff King <peff@peff.net> wrote: > > The merge-ort code has an err() function, but it's really just error() > in disguise. It differs in two ways: > > 1. It takes a "struct merge_options" argument. But the function > completely ignores it! We can simply remove it. Oops, when I simplified the err() function copied from merge-recursive.c in one way, I failed to notice that it enabled further simplifications. > 2. It formats the error string into a strbuf, prepending "error: ", > and then feeds the result into error(). But this is wrong! The > error() function already adds the prefix, so we end up with: > > error: error: Failed to execute internal merge ...and the same problem can be found in merge-recursive.c's err() function. Not sure what current opinions on whether we should bother fixing those up. I do intend on nuking merge-recursive.c, but I obviously haven't had much Git time this year. > So let's just drop this function entirely and call error() directly, as > the functions are otherwise identical (note that they both always return > -1). > > Presumably nobody noticed the bogus messages because they are quite hard > to trigger (they are mostly internal errors reading and writing > objects). However, one easy trigger is a custom merge driver which dies > by signal; we have a test already here, but we were not checking the > contents of stderr. Thanks for catching this. > Signed-off-by: Jeff King <peff@peff.net> > --- > A few of these messages starts with capital letters, which is unlike our > usual error message style. I didn't clean that up here. We could do so > on top, There are two of these. In my defense, they were copied verbatim from merge-recursive.c. And I, um, never noticed the problem over there before copying. Or after. > but I actually wonder if some of these ought to be using > path_msg() and continuing instead, to give output closer to other > conflict or error cases (e.g., conflicts caused by missing submodule > objects). But I dunno. I guess these are all more clearly "woah, > something is totally wrong" that we do not expect to happen, so it > probably isn't a big deal to just abort. Yeah, all callers of err()/error() are for things that should never happen regardless of repository contents and should result in an instant abort, whereas anything calling path_msg() is a conflict or informational message that is expected for various kinds of repository data -- these messages are accumulated and later shown. Another distinction is that any call to path_msg() is associated to a very specific path (or a few specific paths in special cases like renames or add/add with conflict modes), whereas none of the calls to err()/error() have a specific path they are about. This serves a few purposes: * We've had reports before that users get confused when there are multiple conflict messages about a path and they do not occur together. The structure of the merge machinery is such that it often has to process conflicts by type and then by path, rather than by path and then by type. If a merge has many conflicts, processing by type and then by path, combined with printing as you go, naturally results in cases where there are multiple conflict type messages for a single path, but the messages are separated by dozens or hundreds of lines of conflict messages about other paths. By accumulating and printing later, at print time we can sort based on path and provide nicer output (though renames and such might still cause some separation of related messages). * Accumulating and printing conflict & informational messages later is also more friendly for use by other tools such as merge-tree or rebase that may want to only conditionally print the messages or even operate on the structured data (the specific paths and conflict types recorded with them) in some special way. Dscho and I talked about that for his webby-merge-ui-for-github tool he was working on. Anyway, long story short is that I think continuing to use error() instead of path_msg() or something else makes sense here. The capital to lowercase cleanups make sense; we could even #leftoverbits for that piece. > merge-ort.c | 28 +++++----------------------- > t/t6406-merge-attr.sh | 3 ++- > 2 files changed, 7 insertions(+), 24 deletions(-) > > diff --git a/merge-ort.c b/merge-ort.c > index 8631c99700..027ecc7f78 100644 > --- a/merge-ort.c > +++ b/merge-ort.c > @@ -721,23 +721,6 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, > renames->callback_data_nr = renames->callback_data_alloc = 0; > } > > -__attribute__((format (printf, 2, 3))) > -static int err(struct merge_options *opt, const char *err, ...) > -{ > - va_list params; > - struct strbuf sb = STRBUF_INIT; > - > - strbuf_addstr(&sb, "error: "); > - va_start(params, err); > - strbuf_vaddf(&sb, err, params); > - va_end(params); > - > - error("%s", sb.buf); > - strbuf_release(&sb); > - > - return -1; > -} > - > static void format_commit(struct strbuf *sb, > int indent, > struct repository *repo, > @@ -2122,13 +2105,12 @@ static int handle_content_merge(struct merge_options *opt, > &result_buf); > > if ((merge_status < 0) || !result_buf.ptr) > - ret = err(opt, _("Failed to execute internal merge")); > + ret = error(_("Failed to execute internal merge")); > > if (!ret && > write_object_file(result_buf.ptr, result_buf.size, > OBJ_BLOB, &result->oid)) > - ret = err(opt, _("Unable to add %s to database"), > - path); > + ret = error(_("Unable to add %s to database"), path); > > free(result_buf.ptr); > if (ret) > @@ -3518,10 +3500,10 @@ static int read_oid_strbuf(struct merge_options *opt, > unsigned long size; > buf = repo_read_object_file(the_repository, oid, &type, &size); > if (!buf) > - return err(opt, _("cannot read object %s"), oid_to_hex(oid)); > + return error(_("cannot read object %s"), oid_to_hex(oid)); > if (type != OBJ_BLOB) { > free(buf); > - return err(opt, _("object %s is not a blob"), oid_to_hex(oid)); > + return error(_("object %s is not a blob"), oid_to_hex(oid)); > } > strbuf_attach(dst, buf, size, size + 1); > return 0; > @@ -4973,7 +4955,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt, > * TRANSLATORS: The %s arguments are: 1) tree hash of a merge > * base, and 2-3) the trees for the two trees we're merging. > */ > - err(opt, _("collecting merge info failed for trees %s, %s, %s"), > + error(_("collecting merge info failed for trees %s, %s, %s"), > oid_to_hex(&merge_base->object.oid), > oid_to_hex(&side1->object.oid), > oid_to_hex(&side2->object.oid)); > diff --git a/t/t6406-merge-attr.sh b/t/t6406-merge-attr.sh > index 9677180a5b..05ad13b23e 100755 > --- a/t/t6406-merge-attr.sh > +++ b/t/t6406-merge-attr.sh > @@ -179,7 +179,8 @@ test_expect_success !WINDOWS 'custom merge driver that is killed with a signal' > > >./please-abort && > echo "* merge=custom" >.gitattributes && > - test_must_fail git merge main && > + test_must_fail git merge main 2>err && > + grep "^error: Failed to execute internal merge" err && > git ls-files -u >output && > git diff --name-only HEAD >>output && > test_must_be_empty output > -- > 2.42.0.628.g8a27295885 Thanks for fixing this up.
On Fri, Sep 15, 2023 at 07:54:28PM -0700, Elijah Newren wrote: > Oops, when I simplified the err() function copied from > merge-recursive.c in one way, I failed to notice that it enabled > further simplifications. Ah, I didn't even realize that it had been copied from there. That makes a lot more sense now. > > 2. It formats the error string into a strbuf, prepending "error: ", > > and then feeds the result into error(). But this is wrong! The > > error() function already adds the prefix, so we end up with: > > > > error: error: Failed to execute internal merge > > ...and the same problem can be found in merge-recursive.c's err() function. > > Not sure what current opinions on whether we should bother fixing > those up. I do intend on nuking merge-recursive.c, but I obviously > haven't had much Git time this year. Hmm, I'm not sure that it does. It has this code: if (opt->buffer_output < 2) flush_output(opt); else { strbuf_complete(&opt->obuf, '\n'); strbuf_addstr(&opt->obuf, "error: "); } so we only add the extra "error:" tag when we are in a buffering mode that writes into the strbuf (indicated by a high buffer_output field). And then later, after formatting the new string into opt->obuf, we do this: if (opt->buffer_output > 1) strbuf_addch(&opt->obuf, '\n'); else { error("%s", opt->obuf.buf); strbuf_reset(&opt->obuf); } So we call error() iff buffer is low, in which case we would not have added the prefix earlier. And that makes sense. If we are collecting messages in obuf, we cannot use error(), and we have to handle the prefix ourselves. So I think it's correct? If you started with merge-recursive's err() and then stripped it down to remove the extra buffering stuff, I can see that it would be a natural error to accidentally leave in the extra prefix while doing so. I certainly find the code confusing (the split "< 2" and "> 1" conditions to trigger related cases is a nice obfuscation bonus), but I _think_ it's right. And if the end goal is to ditch merge-recursive.c, I don't think it's worth spending any more brain power on it. > > A few of these messages starts with capital letters, which is unlike our > > usual error message style. I didn't clean that up here. We could do so > > on top, > > There are two of these. In my defense, they were copied verbatim from > merge-recursive.c. And I, um, never noticed the problem over there > before copying. Or after. That makes sense. We have a ton of these lurking in the code base. I usually try to clean them up when I touch relevant lines, but I wasn't sure about the path_msg thing here. > Yeah, all callers of err()/error() are for things that should never > happen regardless of repository contents and should result in an > instant abort, whereas anything calling path_msg() is a conflict or > informational message that is expected for various kinds of repository > data -- these messages are accumulated and later shown. That makes sense in general. And I think most of these err() calls are of that form (in fact, I'd expect most corruption to just trigger a die() at a low level anyway, but we've been slowly lib-ifying that away). The one that gave me pause was my "the external merge driver gave us a failure" case that I used for testing. In theory we could say "oops, your merge driver is broken" and keep going. But I think it's not just "broken", but "oops, your merge driver died with a signal", since a non-zero exit just means conflicts. So at that point probably something really has gone terribly wrong (not necessarily with Git, but with your merge driver!). > Another distinction is that any call to path_msg() is associated to a > very specific path (or a few specific paths in special cases like > renames or add/add with conflict modes), whereas none of the calls to > err()/error() have a specific path they are about. This serves a few > purposes: I think some of them are about specific paths. We hit err() if merge_3way() returns -1, but that call can of course also result in a regular conflict. But if they're catastrophic errors anyway (as above), this part is kind of moot. > Anyway, long story short is that I think continuing to use error() > instead of path_msg() or something else makes sense here. The capital > to lowercase cleanups make sense; we could even #leftoverbits for that > piece. Sounds good. I'll post a patch in a second, just to take care of it while we're thinking about it. -Peff
diff --git a/merge-ort.c b/merge-ort.c index 8631c99700..027ecc7f78 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -721,23 +721,6 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, renames->callback_data_nr = renames->callback_data_alloc = 0; } -__attribute__((format (printf, 2, 3))) -static int err(struct merge_options *opt, const char *err, ...) -{ - va_list params; - struct strbuf sb = STRBUF_INIT; - - strbuf_addstr(&sb, "error: "); - va_start(params, err); - strbuf_vaddf(&sb, err, params); - va_end(params); - - error("%s", sb.buf); - strbuf_release(&sb); - - return -1; -} - static void format_commit(struct strbuf *sb, int indent, struct repository *repo, @@ -2122,13 +2105,12 @@ static int handle_content_merge(struct merge_options *opt, &result_buf); if ((merge_status < 0) || !result_buf.ptr) - ret = err(opt, _("Failed to execute internal merge")); + ret = error(_("Failed to execute internal merge")); if (!ret && write_object_file(result_buf.ptr, result_buf.size, OBJ_BLOB, &result->oid)) - ret = err(opt, _("Unable to add %s to database"), - path); + ret = error(_("Unable to add %s to database"), path); free(result_buf.ptr); if (ret) @@ -3518,10 +3500,10 @@ static int read_oid_strbuf(struct merge_options *opt, unsigned long size; buf = repo_read_object_file(the_repository, oid, &type, &size); if (!buf) - return err(opt, _("cannot read object %s"), oid_to_hex(oid)); + return error(_("cannot read object %s"), oid_to_hex(oid)); if (type != OBJ_BLOB) { free(buf); - return err(opt, _("object %s is not a blob"), oid_to_hex(oid)); + return error(_("object %s is not a blob"), oid_to_hex(oid)); } strbuf_attach(dst, buf, size, size + 1); return 0; @@ -4973,7 +4955,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt, * TRANSLATORS: The %s arguments are: 1) tree hash of a merge * base, and 2-3) the trees for the two trees we're merging. */ - err(opt, _("collecting merge info failed for trees %s, %s, %s"), + error(_("collecting merge info failed for trees %s, %s, %s"), oid_to_hex(&merge_base->object.oid), oid_to_hex(&side1->object.oid), oid_to_hex(&side2->object.oid)); diff --git a/t/t6406-merge-attr.sh b/t/t6406-merge-attr.sh index 9677180a5b..05ad13b23e 100755 --- a/t/t6406-merge-attr.sh +++ b/t/t6406-merge-attr.sh @@ -179,7 +179,8 @@ test_expect_success !WINDOWS 'custom merge driver that is killed with a signal' >./please-abort && echo "* merge=custom" >.gitattributes && - test_must_fail git merge main && + test_must_fail git merge main 2>err && + grep "^error: Failed to execute internal merge" err && git ls-files -u >output && git diff --name-only HEAD >>output && test_must_be_empty output
The merge-ort code has an err() function, but it's really just error() in disguise. It differs in two ways: 1. It takes a "struct merge_options" argument. But the function completely ignores it! We can simply remove it. 2. It formats the error string into a strbuf, prepending "error: ", and then feeds the result into error(). But this is wrong! The error() function already adds the prefix, so we end up with: error: error: Failed to execute internal merge So let's just drop this function entirely and call error() directly, as the functions are otherwise identical (note that they both always return -1). Presumably nobody noticed the bogus messages because they are quite hard to trigger (they are mostly internal errors reading and writing objects). However, one easy trigger is a custom merge driver which dies by signal; we have a test already here, but we were not checking the contents of stderr. Signed-off-by: Jeff King <peff@peff.net> --- A few of these messages starts with capital letters, which is unlike our usual error message style. I didn't clean that up here. We could do so on top, but I actually wonder if some of these ought to be using path_msg() and continuing instead, to give output closer to other conflict or error cases (e.g., conflicts caused by missing submodule objects). But I dunno. I guess these are all more clearly "woah, something is totally wrong" that we do not expect to happen, so it probably isn't a big deal to just abort. merge-ort.c | 28 +++++----------------------- t/t6406-merge-attr.sh | 3 ++- 2 files changed, 7 insertions(+), 24 deletions(-)