diff mbox series

[1/4] merge-ort: drop custom err() function

Message ID 20230914093948.GA2254894@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 808e83f2667e4b442a8f58f0c7ef55feb6864f65
Headers show
Series merge-ort unused parameter cleanups | expand

Commit Message

Jeff King Sept. 14, 2023, 9:39 a.m. UTC
The merge-ort code has an err() function, but it's really just error()
in disguise. It differs in two ways:

  1. It takes a "struct merge_options" argument. But the function
     completely ignores it! We can simply remove it.

  2. It formats the error string into a strbuf, prepending "error: ",
     and then feeds the result into error(). But this is wrong! The
     error() function already adds the prefix, so we end up with:

        error: error: Failed to execute internal merge

So let's just drop this function entirely and call error() directly, as
the functions are otherwise identical (note that they both always return
-1).

Presumably nobody noticed the bogus messages because they are quite hard
to trigger (they are mostly internal errors reading and writing
objects). However, one easy trigger is a custom merge driver which dies
by signal; we have a test already here, but we were not checking the
contents of stderr.

Signed-off-by: Jeff King <peff@peff.net>
---
A few of these messages starts with capital letters, which is unlike our
usual error message style. I didn't clean that up here. We could do so
on top, but I actually wonder if some of these ought to be using
path_msg() and continuing instead, to give output closer to other
conflict or error cases (e.g., conflicts caused by missing submodule
objects). But I dunno. I guess these are all more clearly "woah,
something is totally wrong" that we do not expect to happen, so it
probably isn't a big deal to just abort.

 merge-ort.c           | 28 +++++-----------------------
 t/t6406-merge-attr.sh |  3 ++-
 2 files changed, 7 insertions(+), 24 deletions(-)

Comments

Elijah Newren Sept. 16, 2023, 2:54 a.m. UTC | #1
On Thu, Sep 14, 2023 at 2:39 AM Jeff King <peff@peff.net> wrote:
>
> The merge-ort code has an err() function, but it's really just error()
> in disguise. It differs in two ways:
>
>   1. It takes a "struct merge_options" argument. But the function
>      completely ignores it! We can simply remove it.

Oops, when I simplified the err() function copied from
merge-recursive.c in one way, I failed to notice that it enabled
further simplifications.

>   2. It formats the error string into a strbuf, prepending "error: ",
>      and then feeds the result into error(). But this is wrong! The
>      error() function already adds the prefix, so we end up with:
>
>         error: error: Failed to execute internal merge

...and the same problem can be found in merge-recursive.c's err() function.

Not sure what current opinions on whether we should bother fixing
those up.  I do intend on nuking merge-recursive.c, but I obviously
haven't had much Git time this year.

> So let's just drop this function entirely and call error() directly, as
> the functions are otherwise identical (note that they both always return
> -1).
>
> Presumably nobody noticed the bogus messages because they are quite hard
> to trigger (they are mostly internal errors reading and writing
> objects). However, one easy trigger is a custom merge driver which dies
> by signal; we have a test already here, but we were not checking the
> contents of stderr.

Thanks for catching this.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
> A few of these messages starts with capital letters, which is unlike our
> usual error message style. I didn't clean that up here. We could do so
> on top,

There are two of these.  In my defense, they were copied verbatim from
merge-recursive.c.  And I, um, never noticed the problem over there
before copying.  Or after.

> but I actually wonder if some of these ought to be using
> path_msg() and continuing instead, to give output closer to other
> conflict or error cases (e.g., conflicts caused by missing submodule
> objects). But I dunno. I guess these are all more clearly "woah,
> something is totally wrong" that we do not expect to happen, so it
> probably isn't a big deal to just abort.

Yeah, all callers of err()/error() are for things that should never
happen regardless of repository contents and should result in an
instant abort, whereas anything calling path_msg() is a conflict or
informational message that is expected for various kinds of repository
data -- these messages are accumulated and later shown.

Another distinction is that any call to path_msg() is associated to a
very specific path (or a few specific paths in special cases like
renames or add/add with conflict modes), whereas none of the calls to
err()/error() have a specific path they are about.  This serves a few
purposes:
  * We've had reports before that users get confused when there are
multiple conflict messages about a path and they do not occur
together.  The structure of the merge machinery is such that it often
has to process conflicts by type and then by path, rather than by path
and then by type.  If a merge has many conflicts, processing by type
and then by path, combined with printing as you go, naturally results
in cases where there are multiple conflict type messages for a single
path, but the messages are separated by dozens or hundreds of lines of
conflict messages about other paths.  By accumulating and printing
later, at print time we can sort based on path and provide nicer
output (though renames and such might still cause some separation of
related messages).
  * Accumulating and printing conflict & informational messages later
is also more friendly for use by other tools such as merge-tree or
rebase that may want to only conditionally print the messages or even
operate on the structured data (the specific paths and conflict types
recorded with them) in some special way.  Dscho and I talked about
that for his webby-merge-ui-for-github tool he was working on.

Anyway, long story short is that I think continuing to use error()
instead of path_msg() or something else makes sense here.  The capital
to lowercase cleanups make sense; we could even #leftoverbits for that
piece.


>  merge-ort.c           | 28 +++++-----------------------
>  t/t6406-merge-attr.sh |  3 ++-
>  2 files changed, 7 insertions(+), 24 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 8631c99700..027ecc7f78 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -721,23 +721,6 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
>         renames->callback_data_nr = renames->callback_data_alloc = 0;
>  }
>
> -__attribute__((format (printf, 2, 3)))
> -static int err(struct merge_options *opt, const char *err, ...)
> -{
> -       va_list params;
> -       struct strbuf sb = STRBUF_INIT;
> -
> -       strbuf_addstr(&sb, "error: ");
> -       va_start(params, err);
> -       strbuf_vaddf(&sb, err, params);
> -       va_end(params);
> -
> -       error("%s", sb.buf);
> -       strbuf_release(&sb);
> -
> -       return -1;
> -}
> -
>  static void format_commit(struct strbuf *sb,
>                           int indent,
>                           struct repository *repo,
> @@ -2122,13 +2105,12 @@ static int handle_content_merge(struct merge_options *opt,
>                                           &result_buf);
>
>                 if ((merge_status < 0) || !result_buf.ptr)
> -                       ret = err(opt, _("Failed to execute internal merge"));
> +                       ret = error(_("Failed to execute internal merge"));
>
>                 if (!ret &&
>                     write_object_file(result_buf.ptr, result_buf.size,
>                                       OBJ_BLOB, &result->oid))
> -                       ret = err(opt, _("Unable to add %s to database"),
> -                                 path);
> +                       ret = error(_("Unable to add %s to database"), path);
>
>                 free(result_buf.ptr);
>                 if (ret)
> @@ -3518,10 +3500,10 @@ static int read_oid_strbuf(struct merge_options *opt,
>         unsigned long size;
>         buf = repo_read_object_file(the_repository, oid, &type, &size);
>         if (!buf)
> -               return err(opt, _("cannot read object %s"), oid_to_hex(oid));
> +               return error(_("cannot read object %s"), oid_to_hex(oid));
>         if (type != OBJ_BLOB) {
>                 free(buf);
> -               return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
> +               return error(_("object %s is not a blob"), oid_to_hex(oid));
>         }
>         strbuf_attach(dst, buf, size, size + 1);
>         return 0;
> @@ -4973,7 +4955,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
>                  * TRANSLATORS: The %s arguments are: 1) tree hash of a merge
>                  * base, and 2-3) the trees for the two trees we're merging.
>                  */
> -               err(opt, _("collecting merge info failed for trees %s, %s, %s"),
> +               error(_("collecting merge info failed for trees %s, %s, %s"),
>                     oid_to_hex(&merge_base->object.oid),
>                     oid_to_hex(&side1->object.oid),
>                     oid_to_hex(&side2->object.oid));
> diff --git a/t/t6406-merge-attr.sh b/t/t6406-merge-attr.sh
> index 9677180a5b..05ad13b23e 100755
> --- a/t/t6406-merge-attr.sh
> +++ b/t/t6406-merge-attr.sh
> @@ -179,7 +179,8 @@ test_expect_success !WINDOWS 'custom merge driver that is killed with a signal'
>
>         >./please-abort &&
>         echo "* merge=custom" >.gitattributes &&
> -       test_must_fail git merge main &&
> +       test_must_fail git merge main 2>err &&
> +       grep "^error: Failed to execute internal merge" err &&
>         git ls-files -u >output &&
>         git diff --name-only HEAD >>output &&
>         test_must_be_empty output
> --
> 2.42.0.628.g8a27295885

Thanks for fixing this up.
Jeff King Sept. 16, 2023, 5:50 a.m. UTC | #2
On Fri, Sep 15, 2023 at 07:54:28PM -0700, Elijah Newren wrote:

> Oops, when I simplified the err() function copied from
> merge-recursive.c in one way, I failed to notice that it enabled
> further simplifications.

Ah, I didn't even realize that it had been copied from there. That makes
a lot more sense now.

> >   2. It formats the error string into a strbuf, prepending "error: ",
> >      and then feeds the result into error(). But this is wrong! The
> >      error() function already adds the prefix, so we end up with:
> >
> >         error: error: Failed to execute internal merge
> 
> ...and the same problem can be found in merge-recursive.c's err() function.
> 
> Not sure what current opinions on whether we should bother fixing
> those up.  I do intend on nuking merge-recursive.c, but I obviously
> haven't had much Git time this year.

Hmm, I'm not sure that it does. It has this code:

          if (opt->buffer_output < 2)
                  flush_output(opt);
          else {
                  strbuf_complete(&opt->obuf, '\n');
                  strbuf_addstr(&opt->obuf, "error: ");
          }

so we only add the extra "error:" tag when we are in a buffering mode
that writes into the strbuf (indicated by a high buffer_output field).

And then later, after formatting the new string into opt->obuf, we do
this:

          if (opt->buffer_output > 1)
                  strbuf_addch(&opt->obuf, '\n');
          else {
                  error("%s", opt->obuf.buf);
                  strbuf_reset(&opt->obuf);
          }

So we call error() iff buffer is low, in which case we would not have
added the prefix earlier. And that makes sense. If we are collecting
messages in obuf, we cannot use error(), and we have to handle the
prefix ourselves.

So I think it's correct? If you started with merge-recursive's err() and
then stripped it down to remove the extra buffering stuff, I can see
that it would be a natural error to accidentally leave in the extra
prefix while doing so.

I certainly find the code confusing (the split "< 2" and "> 1"
conditions to trigger related cases is a nice obfuscation bonus), but I
_think_ it's right. And if the end goal is to ditch merge-recursive.c, I
don't think it's worth spending any more brain power on it.

> > A few of these messages starts with capital letters, which is unlike our
> > usual error message style. I didn't clean that up here. We could do so
> > on top,
> 
> There are two of these.  In my defense, they were copied verbatim from
> merge-recursive.c.  And I, um, never noticed the problem over there
> before copying.  Or after.

That makes sense. We have a ton of these lurking in the code base. I
usually try to clean them up when I touch relevant lines, but I wasn't
sure about the path_msg thing here.

> Yeah, all callers of err()/error() are for things that should never
> happen regardless of repository contents and should result in an
> instant abort, whereas anything calling path_msg() is a conflict or
> informational message that is expected for various kinds of repository
> data -- these messages are accumulated and later shown.

That makes sense in general. And I think most of these err() calls are
of that form (in fact, I'd expect most corruption to just trigger a
die() at a low level anyway, but we've been slowly lib-ifying that
away).

The one that gave me pause was my "the external merge driver gave us a
failure" case that I used for testing. In theory we could say "oops,
your merge driver is broken" and keep going. But I think it's not just
"broken", but "oops, your merge driver died with a signal", since a
non-zero exit just means conflicts. So at that point probably something
really has gone terribly wrong (not necessarily with Git, but with your
merge driver!).

> Another distinction is that any call to path_msg() is associated to a
> very specific path (or a few specific paths in special cases like
> renames or add/add with conflict modes), whereas none of the calls to
> err()/error() have a specific path they are about.  This serves a few
> purposes:

I think some of them are about specific paths. We hit err() if
merge_3way() returns -1, but that call can of course also result in a
regular conflict.

But if they're catastrophic errors anyway (as above), this part is kind
of moot.

> Anyway, long story short is that I think continuing to use error()
> instead of path_msg() or something else makes sense here.  The capital
> to lowercase cleanups make sense; we could even #leftoverbits for that
> piece.

Sounds good. I'll post a patch in a second, just to take care of it
while we're thinking about it.

-Peff
diff mbox series

Patch

diff --git a/merge-ort.c b/merge-ort.c
index 8631c99700..027ecc7f78 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -721,23 +721,6 @@  static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 	renames->callback_data_nr = renames->callback_data_alloc = 0;
 }
 
-__attribute__((format (printf, 2, 3)))
-static int err(struct merge_options *opt, const char *err, ...)
-{
-	va_list params;
-	struct strbuf sb = STRBUF_INIT;
-
-	strbuf_addstr(&sb, "error: ");
-	va_start(params, err);
-	strbuf_vaddf(&sb, err, params);
-	va_end(params);
-
-	error("%s", sb.buf);
-	strbuf_release(&sb);
-
-	return -1;
-}
-
 static void format_commit(struct strbuf *sb,
 			  int indent,
 			  struct repository *repo,
@@ -2122,13 +2105,12 @@  static int handle_content_merge(struct merge_options *opt,
 					  &result_buf);
 
 		if ((merge_status < 0) || !result_buf.ptr)
-			ret = err(opt, _("Failed to execute internal merge"));
+			ret = error(_("Failed to execute internal merge"));
 
 		if (!ret &&
 		    write_object_file(result_buf.ptr, result_buf.size,
 				      OBJ_BLOB, &result->oid))
-			ret = err(opt, _("Unable to add %s to database"),
-				  path);
+			ret = error(_("Unable to add %s to database"), path);
 
 		free(result_buf.ptr);
 		if (ret)
@@ -3518,10 +3500,10 @@  static int read_oid_strbuf(struct merge_options *opt,
 	unsigned long size;
 	buf = repo_read_object_file(the_repository, oid, &type, &size);
 	if (!buf)
-		return err(opt, _("cannot read object %s"), oid_to_hex(oid));
+		return error(_("cannot read object %s"), oid_to_hex(oid));
 	if (type != OBJ_BLOB) {
 		free(buf);
-		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));
+		return error(_("object %s is not a blob"), oid_to_hex(oid));
 	}
 	strbuf_attach(dst, buf, size, size + 1);
 	return 0;
@@ -4973,7 +4955,7 @@  static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 		 * TRANSLATORS: The %s arguments are: 1) tree hash of a merge
 		 * base, and 2-3) the trees for the two trees we're merging.
 		 */
-		err(opt, _("collecting merge info failed for trees %s, %s, %s"),
+		error(_("collecting merge info failed for trees %s, %s, %s"),
 		    oid_to_hex(&merge_base->object.oid),
 		    oid_to_hex(&side1->object.oid),
 		    oid_to_hex(&side2->object.oid));
diff --git a/t/t6406-merge-attr.sh b/t/t6406-merge-attr.sh
index 9677180a5b..05ad13b23e 100755
--- a/t/t6406-merge-attr.sh
+++ b/t/t6406-merge-attr.sh
@@ -179,7 +179,8 @@  test_expect_success !WINDOWS 'custom merge driver that is killed with a signal'
 
 	>./please-abort &&
 	echo "* merge=custom" >.gitattributes &&
-	test_must_fail git merge main &&
+	test_must_fail git merge main 2>err &&
+	grep "^error: Failed to execute internal merge" err &&
 	git ls-files -u >output &&
 	git diff --name-only HEAD >>output &&
 	test_must_be_empty output