diff mbox series

[4/8] pretty: fix parsing of half-valid "%<" and "%>" placeholders

Message ID 7d6b62006ecaf7db159e8db0c85455ed58027ce6.1742367347.git.martin.agren@gmail.com (mailing list archive)
State New
Headers show
Series pretty: minor bugfixing, some refactorings | expand

Commit Message

Martin Ågren March 19, 2025, 7:23 a.m. UTC
When we parse a padding directive ("%<" or "%>"), we might populate a
few of the struct's fields before bailing. This can result in such
half-parsed information being used to actually introduce some
padding/truncation.

When parsing a "%<" or "%>", only store the parsed data after parsing
successfully. The added test would have failed before this commit. It
also shows how the existing behavior is hardly something someone can
rely on since the non-consumed modifier ("%<(10,bad)") shows up verbatim
in the pretty output.

We could let the caller use a temporary struct and only copy the data on
success. Let's instead make our parsing function easy to use correctly
by letting it only touch the output struct in the success case.

While setting up a temporary struct for parsing into, we might as well
initialize it to a well-defined state. It's unnecessary for the current
implementation since it always writes to all three fields in a
successful case, but some future-proofing shouldn't hurt.

Note that the test relies on first using a correct placeholder
"%<(4,trunc)" where "trunc" (`trunc_right`) lingers in our struct until
it's then used instead of the invalid "bad". The next commit will teach
us to clean up any remnants of "%<(4,trunc)" after handling it.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 pretty.c                      | 18 ++++++++++++------
 t/t4205-log-pretty-formats.sh |  6 ++++++
 2 files changed, 18 insertions(+), 6 deletions(-)

Comments

Patrick Steinhardt March 20, 2025, 9:18 a.m. UTC | #1
On Wed, Mar 19, 2025 at 08:23:37AM +0100, Martin Ågren wrote:
> When we parse a padding directive ("%<" or "%>"), we might populate a
> few of the struct's fields before bailing. This can result in such
> half-parsed information being used to actually introduce some
> padding/truncation.
> 
> When parsing a "%<" or "%>", only store the parsed data after parsing
> successfully. The added test would have failed before this commit. It
> also shows how the existing behavior is hardly something someone can
> rely on since the non-consumed modifier ("%<(10,bad)") shows up verbatim
> in the pretty output.

Ideally I'd expect us to die when seeing misformatted placeholders like
this. This is way less confusing to the user as otherwise things _look_
like they work, but we silently do the wrong thing.

That being said, I have no idea whether we can do such a change now
without breaking existing usecases. As you rightfully argue the result
already is wrong, but with my proposal we'd completely refuse to do
anything. Which I'd argue is a good thing in the end.

> We could let the caller use a temporary struct and only copy the data on
> success. Let's instead make our parsing function easy to use correctly
> by letting it only touch the output struct in the success case.

s/success/&ful/

> While setting up a temporary struct for parsing into, we might as well
> initialize it to a well-defined state. It's unnecessary for the current
> implementation since it always writes to all three fields in a
> successful case, but some future-proofing shouldn't hurt.
> 
> Note that the test relies on first using a correct placeholder
> "%<(4,trunc)" where "trunc" (`trunc_right`) lingers in our struct until
> it's then used instead of the invalid "bad". The next commit will teach
> us to clean up any remnants of "%<(4,trunc)" after handling it.
> 
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  pretty.c                      | 18 ++++++++++++------
>  t/t4205-log-pretty-formats.sh |  6 ++++++
>  2 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/pretty.c b/pretty.c
> index e5e8ef24fa..a4fa052f8b 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1121,6 +1121,11 @@ static size_t parse_padding_placeholder(const char *placeholder,
>  	const char *ch = placeholder;
>  	enum flush_type flush_type;
>  	int to_column = 0;
> +	struct padding_args ans = {
> +		.flush_type = no_flush,
> +		.truncate = trunc_none,
> +		.padding = 0,
> +	};
>  
>  	switch (*ch++) {
>  	case '<':

I honestly have no idea what `ans` stands for. You could call it
`result` to signify that it's what we'll ultimately bubble up to the
caller in the successful case.

Patrick
Martin Ågren March 20, 2025, 4:11 p.m. UTC | #2
On Thu, 20 Mar 2025 at 10:18, Patrick Steinhardt <ps@pks.im> wrote:
>
> On Wed, Mar 19, 2025 at 08:23:37AM +0100, Martin Ågren wrote:
> > When parsing a "%<" or "%>", only store the parsed data after parsing
> > successfully. The added test would have failed before this commit. It
> > also shows how the existing behavior is hardly something someone can
> > rely on since the non-consumed modifier ("%<(10,bad)") shows up verbatim
> > in the pretty output.
>
> Ideally I'd expect us to die when seeing misformatted placeholders like
> this. This is way less confusing to the user as otherwise things _look_
> like they work, but we silently do the wrong thing.

Right. I can see how it makes some kind of sense to print what we don't
understand when it's something short and simple like "%X". But for more
complex "%X(first,second)" it's kind of obvious that a misspelled
"X(fist,second)" isn't something you want in the output. The whole "if
we can't parse, return zero as the number of consumed characters so that
we can print verbatim while looking for next '%'" is a central piece of
the design here. One could certainly imagine a "strict" mode.

> That being said, I have no idea whether we can do such a change now
> without breaking existing usecases. As you rightfully argue the result
> already is wrong, but with my proposal we'd completely refuse to do
> anything. Which I'd argue is a good thing in the end.

I can see the value of a strict mode, with command line options and
config switches and whatnot, maybe even a changed default behavior at
some point. I'd rather punt on that for now. TBH, I'd be afraid to do a
hard switch from "0 means print it instead" to "0 means die". I don't
disagree that it would be a better end-game though, at some point.

> > We could let the caller use a temporary struct and only copy the data on
> > success. Let's instead make our parsing function easy to use correctly
> > by letting it only touch the output struct in the success case.
>
> s/success/&ful/

Thanks.

> > +     struct padding_args ans = {
> > +             .flush_type = no_flush,
> > +             .truncate = trunc_none,
> > +             .padding = 0,
> > +     };
> >
> >       switch (*ch++) {
> >       case '<':
>
> I honestly have no idea what `ans` stands for. You could call it
> `result` to signify that it's what we'll ultimately bubble up to the
> caller in the successful case.

Fair. :-) It's "answer", but "result" is much better. Thanks.


Martin
Patrick Steinhardt March 24, 2025, 10:10 a.m. UTC | #3
On Thu, Mar 20, 2025 at 05:11:09PM +0100, Martin Ågren wrote:
> On Thu, 20 Mar 2025 at 10:18, Patrick Steinhardt <ps@pks.im> wrote:
> >
> > On Wed, Mar 19, 2025 at 08:23:37AM +0100, Martin Ågren wrote:
> > > When parsing a "%<" or "%>", only store the parsed data after parsing
> > > successfully. The added test would have failed before this commit. It
> > > also shows how the existing behavior is hardly something someone can
> > > rely on since the non-consumed modifier ("%<(10,bad)") shows up verbatim
> > > in the pretty output.
> >
> > Ideally I'd expect us to die when seeing misformatted placeholders like
> > this. This is way less confusing to the user as otherwise things _look_
> > like they work, but we silently do the wrong thing.
> 
> Right. I can see how it makes some kind of sense to print what we don't
> understand when it's something short and simple like "%X". But for more
> complex "%X(first,second)" it's kind of obvious that a misspelled
> "X(fist,second)" isn't something you want in the output. The whole "if
> we can't parse, return zero as the number of consumed characters so that
> we can print verbatim while looking for next '%'" is a central piece of
> the design here. One could certainly imagine a "strict" mode.
> 
> > That being said, I have no idea whether we can do such a change now
> > without breaking existing usecases. As you rightfully argue the result
> > already is wrong, but with my proposal we'd completely refuse to do
> > anything. Which I'd argue is a good thing in the end.
> 
> I can see the value of a strict mode, with command line options and
> config switches and whatnot, maybe even a changed default behavior at
> some point. I'd rather punt on that for now. TBH, I'd be afraid to do a
> hard switch from "0 means print it instead" to "0 means die". I don't
> disagree that it would be a better end-game though, at some point.

Yup, I fully agree that this is a bit more of a risky change and that it
doesn't have to be part of this patch series.

Patrick
diff mbox series

Patch

diff --git a/pretty.c b/pretty.c
index e5e8ef24fa..a4fa052f8b 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1121,6 +1121,11 @@  static size_t parse_padding_placeholder(const char *placeholder,
 	const char *ch = placeholder;
 	enum flush_type flush_type;
 	int to_column = 0;
+	struct padding_args ans = {
+		.flush_type = no_flush,
+		.truncate = trunc_none,
+		.padding = 0,
+	};
 
 	switch (*ch++) {
 	case '<':
@@ -1171,8 +1176,8 @@  static size_t parse_padding_placeholder(const char *placeholder,
 			if (width < 0)
 				return 0;
 		}
-		p->padding = to_column ? -width : width;
-		p->flush_type = flush_type;
+		ans.padding = to_column ? -width : width;
+		ans.flush_type = flush_type;
 
 		if (*end == ',') {
 			start = end + 1;
@@ -1180,16 +1185,17 @@  static size_t parse_padding_placeholder(const char *placeholder,
 			if (!end || end == start)
 				return 0;
 			if (starts_with(start, "trunc)"))
-				p->truncate = trunc_right;
+				ans.truncate = trunc_right;
 			else if (starts_with(start, "ltrunc)"))
-				p->truncate = trunc_left;
+				ans.truncate = trunc_left;
 			else if (starts_with(start, "mtrunc)"))
-				p->truncate = trunc_middle;
+				ans.truncate = trunc_middle;
 			else
 				return 0;
 		} else
-			p->truncate = trunc_none;
+			ans.truncate = trunc_none;
 
+		*p = ans;
 		return end - placeholder + 1;
 	}
 	return 0;
diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index f81e42a84d..26987ecd77 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -1130,6 +1130,12 @@  test_expect_success 'log --pretty with invalid padding format' '
 	test_cmp expect actual
 '
 
+test_expect_success 'semi-parseable padding format does not get semi-applied' '
+	git log -1 --pretty="format:%<(4,trunc)%H%%<(10,bad)%H" >expect &&
+	git log -1 --pretty="format:%<(4,trunc)%H%<(10,bad)%H" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'log --pretty with magical wrapping directives' '
 	commit_id=$(git commit-tree HEAD^{tree} -m "describe me") &&
 	git tag describe-me $commit_id &&