[4/5] ref-filter: truncate atom names in error messages

Message ID	Y5n4mb/S/RORb+N7@coredump.intra.peff.net (mailing list archive)
State	Accepted
Commit	1955ef10edf3c888dcb237728c81dd6e81df4960
Headers	show Return-Path: <git-owner@kernel.org> Date: Wed, 14 Dec 2022 11:23:53 -0500 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Subject: [PATCH 4/5] ref-filter: truncate atom names in error messages Message-ID: <Y5n4mb/S/RORb+N7@coredump.intra.peff.net> References: <Y5n3S7cnD7s/AIRL@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <Y5n3S7cnD7s/AIRL@coredump.intra.peff.net> Precedence: bulk
Series	minor ref-filter error-reporting bug-fixes \| expand [0/5] minor ref-filter error-reporting bug-fixes [1/5] ref-filter: reject arguments to %(HEAD) [2/5] ref-filter: factor out "%(foo) does not take arguments" errors [3/5] ref-filter: factor out "unrecognized %(foo) arg" errors [4/5] ref-filter: truncate atom names in error messages [5/5] ref-filter: convert email atom parser to use err_bad_arg()

Message ID

Y5n4mb/S/RORb+N7@coredump.intra.peff.net (mailing list archive)

State

Accepted

Commit

1955ef10edf3c888dcb237728c81dd6e81df4960

Headers

Date: Wed, 14 Dec 2022 11:23:53 -0500
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 4/5] ref-filter: truncate atom names in error messages
Message-ID: <Y5n4mb/S/RORb+N7@coredump.intra.peff.net>
References: <Y5n3S7cnD7s/AIRL@coredump.intra.peff.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <Y5n3S7cnD7s/AIRL@coredump.intra.peff.net>
Precedence: bulk

Series

minor ref-filter error-reporting bug-fixes | expand

Commit Message

Jeff King Dec. 14, 2022, 4:23 p.m. UTC

If you pass a bogus argument to %(refname), you may end up with a
message like this:

  $ git for-each-ref --format='%(refname:foo)'
  fatal: unrecognized %(refname:foo) argument: foo

which is confusing. It should just say:

  fatal: unrecognized %(refname) argument: foo

which is clearer, and is consistent with most other atom parsers. Those
other parsers do not have the same problem because they pass the atom
name from a string literal in the parser function. But because the
parser for %(refname) also handles %(upstream) and %(push), it instead
uses atom->name, which includes the arguments. The oid atom parser which
handles %(tree), %(parent), etc suffers from the same problem.

It seems like the cleanest fix would be for atom->name to be _just_ the
name, since there's already a separate "args" field. But since that
field is also used for other things, we can't change it easily (e.g.,
it's how we find things in the used_atoms array, and clearly %(refname)
and %(refname:short) are not the same thing).

Instead, we'll teach our error_bad_arg() function to stop at the first
":". This is a little hacky, as we're effectively re-parsing the name,
but the format is simple enough to do this as a one-liner, and this
localizes the change to the error-reporting code.

We'll give the same treatment to err_no_arg(). None of its callers use
this atom->name trick, but it's worth future-proofing it while we're
here.

Signed-off-by: Jeff King <peff@peff.net>
---
 ref-filter.c            | 12 ++++++++----
 t/t6300-for-each-ref.sh |  6 ++++++
 2 files changed, 14 insertions(+), 4 deletions(-)

Comments

Taylor Blau Dec. 14, 2022, 8:05 p.m. UTC | #1

On Wed, Dec 14, 2022 at 11:23:53AM -0500, Jeff King wrote:
> It seems like the cleanest fix would be for atom->name to be _just_ the
> name, since there's already a separate "args" field. But since that
> field is also used for other things, we can't change it easily (e.g.,
> it's how we find things in the used_atoms array, and clearly %(refname)
> and %(refname:short) are not the same thing).
>
> Instead, we'll teach our error_bad_arg() function to stop at the first
> ":". This is a little hacky, as we're effectively re-parsing the name,
> but the format is simple enough to do this as a one-liner, and this
> localizes the change to the error-reporting code.
>
> We'll give the same treatment to err_no_arg(). None of its callers use
> this atom->name trick, but it's worth future-proofing it while we're
> here.

For what it's worth, I think that this balance of a somewhat-hacky
implementation against a more significant and trickier refactoring is
well thought-out and the right decision, IMHO.

Thanks,
Taylor

Jeff King Dec. 14, 2022, 8:39 p.m. UTC | #2

On Wed, Dec 14, 2022 at 03:05:05PM -0500, Taylor Blau wrote:

> On Wed, Dec 14, 2022 at 11:23:53AM -0500, Jeff King wrote:
> > It seems like the cleanest fix would be for atom->name to be _just_ the
> > name, since there's already a separate "args" field. But since that
> > field is also used for other things, we can't change it easily (e.g.,
> > it's how we find things in the used_atoms array, and clearly %(refname)
> > and %(refname:short) are not the same thing).
> >
> > Instead, we'll teach our error_bad_arg() function to stop at the first
> > ":". This is a little hacky, as we're effectively re-parsing the name,
> > but the format is simple enough to do this as a one-liner, and this
> > localizes the change to the error-reporting code.
> >
> > We'll give the same treatment to err_no_arg(). None of its callers use
> > this atom->name trick, but it's worth future-proofing it while we're
> > here.
> 
> For what it's worth, I think that this balance of a somewhat-hacky
> implementation against a more significant and trickier refactoring is
> well thought-out and the right decision, IMHO.

By the way, I did try the other change, to make atom->name just contain
the name with no args. There are a bunch of pitfalls in
parse_ref_filter_atom(), including:

  - don't use atom_len; it's off-by-one when looking at "atom" and not
    "sp" when there's a "*" dereference

  - the "args" pointer in the struct actually points into the name
    string. I don't think anybody relies on that, but I'm not 100% sure
    because...

  - once you deal with that, then it segfaults mysteriously, because
    the numeric index between used_atom and the computed values gets out
    of sync. That's where I gave up.

Which isn't to say it isn't do-able, or it wouldn't even make the
ref-filter code cleaner overall if somebody did that refactoring. But it
seemed like too much for solving this one little problem.

Here's the patch where I stopped, for posterity:

diff --git a/ref-filter.c b/ref-filter.c
index caf10ab23e..3a2a7d0271 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -716,7 +716,7 @@ static int parse_ref_filter_atom(struct ref_format *format,
 	used_atom_cnt++;
 	REALLOC_ARRAY(used_atom, used_atom_cnt);
 	used_atom[at].atom_type = i;
-	used_atom[at].name = xmemdupz(atom, ep - atom);
+	used_atom[at].name = xmemdupz(atom, (arg ? arg : ep) - atom);
 	used_atom[at].type = valid_atom[i].cmp_type;
 	used_atom[at].source = valid_atom[i].source;
 	if (used_atom[at].source == SOURCE_OBJ) {
@@ -726,8 +726,10 @@ static int parse_ref_filter_atom(struct ref_format *format,
 			oi.info.contentp = &oi.content;
 	}
 	if (arg) {
-		arg = used_atom[at].name + (arg - atom) + 1;
-		if (!*arg) {
+		arg++; /* skip ':' */
+		if (arg < ep) {
+			arg = xmemdupz(arg, ep - arg);
+		} else {
 			/*
 			 * Treat empty sub-arguments list as NULL (i.e.,
 			 * "%(atom:)" is equivalent to "%(atom)").

Its relative shortness does not represent the great confusion I had in
producing it.

-Peff

diff --git a/ref-filter.c b/ref-filter.c
index 271d619da9..f40bc4d9c9 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -230,13 +230,17 @@  static int strbuf_addf_ret(struct strbuf *sb, int ret, const char *fmt, ...)
 
 static int err_no_arg(struct strbuf *sb, const char *name)
 {
-	strbuf_addf(sb, _("%%(%s) does not take arguments"), name);
+	size_t namelen = strchrnul(name, ':') - name;
+	strbuf_addf(sb, _("%%(%.*s) does not take arguments"),
+		    (int)namelen, name);
 	return -1;
 }
 
 static int err_bad_arg(struct strbuf *sb, const char *name, const char *arg)
 {
-	strbuf_addf(sb, _("unrecognized %%(%s) argument: %s"), name, arg);
+	size_t namelen = strchrnul(name, ':') - name;
+	strbuf_addf(sb, _("unrecognized %%(%.*s) argument: %s"),
+		    (int)namelen, name, arg);
 	return -1;
 }
 
@@ -274,7 +278,7 @@  static int refname_atom_parser_internal(struct refname_atom *atom, const char *a
 		if (strtol_i(arg, 10, &atom->rstrip))
 			return strbuf_addf_ret(err, -1, _("Integer value expected refname:rstrip=%s"), arg);
 	} else
-		return strbuf_addf_ret(err, -1, _("unrecognized %%(%s) argument: %s"), name, arg);
+		return err_bad_arg(err, name, arg);
 	return 0;
 }
 
@@ -471,7 +475,7 @@  static int oid_atom_parser(struct ref_format *format, struct used_atom *atom,
 		if (atom->u.oid.length < MINIMUM_ABBREV)
 			atom->u.oid.length = MINIMUM_ABBREV;
 	} else
-		return strbuf_addf_ret(err, -1, _("unrecognized %%(%s) argument: %s"), atom->name, arg);
+		return err_bad_arg(err, atom->name, arg);
 	return 0;
 }
 
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 010ba5a2cb..2ae1fc721b 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1254,6 +1254,12 @@  test_expect_success 'subject atom rejects unknown arguments' '
 	test_cmp expect err
 '
 
+test_expect_success 'refname atom rejects unknown arguments' '
+	test_must_fail git for-each-ref --format="%(refname:foo)" 2>err &&
+	echo "fatal: unrecognized %(refname) argument: foo" >expect &&
+	test_cmp expect err
+'
+
 test_expect_success 'trailer parsing not fooled by --- line' '
 	git commit --allow-empty -F - <<-\EOF &&
 	this is the subject

[4/5] ref-filter: truncate atom names in error messages

Commit Message

Comments

Patch