Message ID | 7d6b62006ecaf7db159e8db0c85455ed58027ce6.1742367347.git.martin.agren@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | pretty: minor bugfixing, some refactorings | expand |
On Wed, Mar 19, 2025 at 08:23:37AM +0100, Martin Ågren wrote: > When we parse a padding directive ("%<" or "%>"), we might populate a > few of the struct's fields before bailing. This can result in such > half-parsed information being used to actually introduce some > padding/truncation. > > When parsing a "%<" or "%>", only store the parsed data after parsing > successfully. The added test would have failed before this commit. It > also shows how the existing behavior is hardly something someone can > rely on since the non-consumed modifier ("%<(10,bad)") shows up verbatim > in the pretty output. Ideally I'd expect us to die when seeing misformatted placeholders like this. This is way less confusing to the user as otherwise things _look_ like they work, but we silently do the wrong thing. That being said, I have no idea whether we can do such a change now without breaking existing usecases. As you rightfully argue the result already is wrong, but with my proposal we'd completely refuse to do anything. Which I'd argue is a good thing in the end. > We could let the caller use a temporary struct and only copy the data on > success. Let's instead make our parsing function easy to use correctly > by letting it only touch the output struct in the success case. s/success/&ful/ > While setting up a temporary struct for parsing into, we might as well > initialize it to a well-defined state. It's unnecessary for the current > implementation since it always writes to all three fields in a > successful case, but some future-proofing shouldn't hurt. > > Note that the test relies on first using a correct placeholder > "%<(4,trunc)" where "trunc" (`trunc_right`) lingers in our struct until > it's then used instead of the invalid "bad". The next commit will teach > us to clean up any remnants of "%<(4,trunc)" after handling it. > > Signed-off-by: Martin Ågren <martin.agren@gmail.com> > --- > pretty.c | 18 ++++++++++++------ > t/t4205-log-pretty-formats.sh | 6 ++++++ > 2 files changed, 18 insertions(+), 6 deletions(-) > > diff --git a/pretty.c b/pretty.c > index e5e8ef24fa..a4fa052f8b 100644 > --- a/pretty.c > +++ b/pretty.c > @@ -1121,6 +1121,11 @@ static size_t parse_padding_placeholder(const char *placeholder, > const char *ch = placeholder; > enum flush_type flush_type; > int to_column = 0; > + struct padding_args ans = { > + .flush_type = no_flush, > + .truncate = trunc_none, > + .padding = 0, > + }; > > switch (*ch++) { > case '<': I honestly have no idea what `ans` stands for. You could call it `result` to signify that it's what we'll ultimately bubble up to the caller in the successful case. Patrick
On Thu, 20 Mar 2025 at 10:18, Patrick Steinhardt <ps@pks.im> wrote: > > On Wed, Mar 19, 2025 at 08:23:37AM +0100, Martin Ågren wrote: > > When parsing a "%<" or "%>", only store the parsed data after parsing > > successfully. The added test would have failed before this commit. It > > also shows how the existing behavior is hardly something someone can > > rely on since the non-consumed modifier ("%<(10,bad)") shows up verbatim > > in the pretty output. > > Ideally I'd expect us to die when seeing misformatted placeholders like > this. This is way less confusing to the user as otherwise things _look_ > like they work, but we silently do the wrong thing. Right. I can see how it makes some kind of sense to print what we don't understand when it's something short and simple like "%X". But for more complex "%X(first,second)" it's kind of obvious that a misspelled "X(fist,second)" isn't something you want in the output. The whole "if we can't parse, return zero as the number of consumed characters so that we can print verbatim while looking for next '%'" is a central piece of the design here. One could certainly imagine a "strict" mode. > That being said, I have no idea whether we can do such a change now > without breaking existing usecases. As you rightfully argue the result > already is wrong, but with my proposal we'd completely refuse to do > anything. Which I'd argue is a good thing in the end. I can see the value of a strict mode, with command line options and config switches and whatnot, maybe even a changed default behavior at some point. I'd rather punt on that for now. TBH, I'd be afraid to do a hard switch from "0 means print it instead" to "0 means die". I don't disagree that it would be a better end-game though, at some point. > > We could let the caller use a temporary struct and only copy the data on > > success. Let's instead make our parsing function easy to use correctly > > by letting it only touch the output struct in the success case. > > s/success/&ful/ Thanks. > > + struct padding_args ans = { > > + .flush_type = no_flush, > > + .truncate = trunc_none, > > + .padding = 0, > > + }; > > > > switch (*ch++) { > > case '<': > > I honestly have no idea what `ans` stands for. You could call it > `result` to signify that it's what we'll ultimately bubble up to the > caller in the successful case. Fair. :-) It's "answer", but "result" is much better. Thanks. Martin
On Thu, Mar 20, 2025 at 05:11:09PM +0100, Martin Ågren wrote: > On Thu, 20 Mar 2025 at 10:18, Patrick Steinhardt <ps@pks.im> wrote: > > > > On Wed, Mar 19, 2025 at 08:23:37AM +0100, Martin Ågren wrote: > > > When parsing a "%<" or "%>", only store the parsed data after parsing > > > successfully. The added test would have failed before this commit. It > > > also shows how the existing behavior is hardly something someone can > > > rely on since the non-consumed modifier ("%<(10,bad)") shows up verbatim > > > in the pretty output. > > > > Ideally I'd expect us to die when seeing misformatted placeholders like > > this. This is way less confusing to the user as otherwise things _look_ > > like they work, but we silently do the wrong thing. > > Right. I can see how it makes some kind of sense to print what we don't > understand when it's something short and simple like "%X". But for more > complex "%X(first,second)" it's kind of obvious that a misspelled > "X(fist,second)" isn't something you want in the output. The whole "if > we can't parse, return zero as the number of consumed characters so that > we can print verbatim while looking for next '%'" is a central piece of > the design here. One could certainly imagine a "strict" mode. > > > That being said, I have no idea whether we can do such a change now > > without breaking existing usecases. As you rightfully argue the result > > already is wrong, but with my proposal we'd completely refuse to do > > anything. Which I'd argue is a good thing in the end. > > I can see the value of a strict mode, with command line options and > config switches and whatnot, maybe even a changed default behavior at > some point. I'd rather punt on that for now. TBH, I'd be afraid to do a > hard switch from "0 means print it instead" to "0 means die". I don't > disagree that it would be a better end-game though, at some point. Yup, I fully agree that this is a bit more of a risky change and that it doesn't have to be part of this patch series. Patrick
diff --git a/pretty.c b/pretty.c index e5e8ef24fa..a4fa052f8b 100644 --- a/pretty.c +++ b/pretty.c @@ -1121,6 +1121,11 @@ static size_t parse_padding_placeholder(const char *placeholder, const char *ch = placeholder; enum flush_type flush_type; int to_column = 0; + struct padding_args ans = { + .flush_type = no_flush, + .truncate = trunc_none, + .padding = 0, + }; switch (*ch++) { case '<': @@ -1171,8 +1176,8 @@ static size_t parse_padding_placeholder(const char *placeholder, if (width < 0) return 0; } - p->padding = to_column ? -width : width; - p->flush_type = flush_type; + ans.padding = to_column ? -width : width; + ans.flush_type = flush_type; if (*end == ',') { start = end + 1; @@ -1180,16 +1185,17 @@ static size_t parse_padding_placeholder(const char *placeholder, if (!end || end == start) return 0; if (starts_with(start, "trunc)")) - p->truncate = trunc_right; + ans.truncate = trunc_right; else if (starts_with(start, "ltrunc)")) - p->truncate = trunc_left; + ans.truncate = trunc_left; else if (starts_with(start, "mtrunc)")) - p->truncate = trunc_middle; + ans.truncate = trunc_middle; else return 0; } else - p->truncate = trunc_none; + ans.truncate = trunc_none; + *p = ans; return end - placeholder + 1; } return 0; diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh index f81e42a84d..26987ecd77 100755 --- a/t/t4205-log-pretty-formats.sh +++ b/t/t4205-log-pretty-formats.sh @@ -1130,6 +1130,12 @@ test_expect_success 'log --pretty with invalid padding format' ' test_cmp expect actual ' +test_expect_success 'semi-parseable padding format does not get semi-applied' ' + git log -1 --pretty="format:%<(4,trunc)%H%%<(10,bad)%H" >expect && + git log -1 --pretty="format:%<(4,trunc)%H%<(10,bad)%H" >actual && + test_cmp expect actual +' + test_expect_success 'log --pretty with magical wrapping directives' ' commit_id=$(git commit-tree HEAD^{tree} -m "describe me") && git tag describe-me $commit_id &&
When we parse a padding directive ("%<" or "%>"), we might populate a few of the struct's fields before bailing. This can result in such half-parsed information being used to actually introduce some padding/truncation. When parsing a "%<" or "%>", only store the parsed data after parsing successfully. The added test would have failed before this commit. It also shows how the existing behavior is hardly something someone can rely on since the non-consumed modifier ("%<(10,bad)") shows up verbatim in the pretty output. We could let the caller use a temporary struct and only copy the data on success. Let's instead make our parsing function easy to use correctly by letting it only touch the output struct in the success case. While setting up a temporary struct for parsing into, we might as well initialize it to a well-defined state. It's unnecessary for the current implementation since it always writes to all three fields in a successful case, but some future-proofing shouldn't hurt. Note that the test relies on first using a correct placeholder "%<(4,trunc)" where "trunc" (`trunc_right`) lingers in our struct until it's then used instead of the invalid "bad". The next commit will teach us to clean up any remnants of "%<(4,trunc)" after handling it. Signed-off-by: Martin Ågren <martin.agren@gmail.com> --- pretty.c | 18 ++++++++++++------ t/t4205-log-pretty-formats.sh | 6 ++++++ 2 files changed, 18 insertions(+), 6 deletions(-)