Message ID | 20241105190235.13502-1-five231003@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | t6300: values containing ')' are broken in ref formats | expand |
Kousik Sanagavarapu <five231003@gmail.com> writes: > Document that values containing ')' in formats are not parsed correctly > in ref-filter. The problem is probably lack of a way to quote such a closing parenthesis. > However formats having a '(' instead in "value" will parse correctly > because in a general format string we also mark start of the format by > making note of '%(' instead of just '('. So if you wanted to have a two-char sequence '%(' in value, you'd see a similar problem? If so, it is not quite a "bug" or "not parsed correctly"---it is "because there is no way to include closing ')' in the value (e.g., by quoting), you cannot write such a string in the value part". > This raises the question of what can be done to parse ')' in values of > the format correctly. It seems to me like a clean solution would > involve a huge refactoring involving a large portion of ref-filter but I > maybe wrong. Yes, so I wouldn't even call the current behaviour "bug". The language is merely "limited" and the user cannot express certain values with it at all. Having said that, I just tried this $ git for-each-ref --format='%28%(refname)%29' refs/heads/master (refs/heads/master) So, if there is anything that needs "fixing", wouldn't it be documentation? If I knew (or easily find out from "git for-each-ref --help") that hex escapes %XX can be used, I wouldn't have written any of what I said before "Having said that" in this response.
On Tue, Nov 05, 2024 at 05:18:10PM -0800, Junio C Hamano wrote: > > This raises the question of what can be done to parse ')' in values of > > the format correctly. It seems to me like a clean solution would > > involve a huge refactoring involving a large portion of ref-filter but I > > maybe wrong. > > Yes, so I wouldn't even call the current behaviour "bug". The > language is merely "limited" and the user cannot express certain > values with it at all. Agreed. I think we may have discussed this quoting problem before, but it's not usually a big deal because the set of likely values is quite limited. Most of them are just keywords or numeric values. I _think_ that the equals/notequals parameters of %(if) are the only ones. Which isn't to say we shouldn't make things better if we can. Just that I am not too surprised nobody has run into it before. > Having said that, I just tried this > > $ git for-each-ref --format='%28%(refname)%29' refs/heads/master > (refs/heads/master) > > So, if there is anything that needs "fixing", wouldn't it be > documentation? > > If I knew (or easily find out from "git for-each-ref --help") that > hex escapes %XX can be used, I wouldn't have written any of what I > said before "Having said that" in this response. I tried something similar, but I don't think it quite works for the case in question. Within %(if:equals=<foo>) we do not further expand the <foo> value (at least from my limited tests). And so something like: git for-each-ref --format='%(if:equals=ref-with-%29)%(refname:short)...etc' would never match "ref-with-(", but only a literal "ref-with-%29". I am tempted to say the solution is to expand that "equals" value, and possibly add some less-arcane version of the character (maybe "%)"?). But it be a break in backwards compatibility if somebody is trying to match literal %-chars in their "if" block. Another option: in the rest of the "if" design we tried to keep arbitrary text outside of the parentheses. So you could imagine a syntax like: %(if:equals)ref-with-)%(foo)%(refname:short)%(then)...%(end) where %(foo) is some placeholder that separate the two arguments to the "equals". In sane languages that is a space or a comma, but I'm not sure that works here. We have %(end) which would otherwise be a syntax error here, but it feels word. I dunno. The whole language is kind of hideously verbose. I feel sorry for anybody trying to write non-trivial formats. :) -Peff
On Tue, Nov 05, 2024 at 05:18:10PM -0800, Junio C Hamano wrote: > Kousik Sanagavarapu <five231003@gmail.com> writes: > > > Document that values containing ')' in formats are not parsed correctly > > in ref-filter. > > The problem is probably lack of a way to quote such a closing > parenthesis. Yes correct. We currently don't have a way to quote such strings in "value" of "%(atom:someparam=value)" > > However formats having a '(' instead in "value" will parse correctly > > because in a general format string we also mark start of the format by > > making note of '%(' instead of just '('. > > So if you wanted to have a two-char sequence '%(' in value, you'd > see a similar problem? If so, it is not quite a "bug" or "not > parsed correctly"---it is "because there is no way to include > closing ')' in the value (e.g., by quoting), you cannot write such a > string in the value part". > > > This raises the question of what can be done to parse ')' in values of > > the format correctly. It seems to me like a clean solution would > > involve a huge refactoring involving a large portion of ref-filter but I > > maybe wrong. > > Yes, so I wouldn't even call the current behaviour "bug". The > language is merely "limited" and the user cannot express certain > values with it at all. > > > Having said that, I just tried this > > $ git for-each-ref --format='%28%(refname)%29' refs/heads/master > (refs/heads/master) > > So, if there is anything that needs "fixing", wouldn't it be > documentation? > > If I knew (or easily find out from "git for-each-ref --help") that > hex escapes %XX can be used, I wouldn't have written any of what I > said before "Having said that" in this response. Hmm, but hex escapes do work as intended and the problem here is not the ')' outside the atom but within it. To be more clear, let's take $ git for-each-ref --format="%(if:equals=refs/heads/step-1)start)%(refname)%(then)%(objectname:short)%(end)" refs/heads/ (Sorry for not wrapping the line above x<) First let's notice the difference between what these two commands are trying to do. Your command asks to print all the refs matching "refs/heads/master" in the format of (%(refname)) and since we support escaping literals in the form of %xx, where xx is the hexcode of the literal to be escaped during the parsing of the format, this would obviously work as intended. Now let's come to my command. My command asks to print all the abbreviated commit ids of the refs which compare equal to "refs/heads/step-1)start" from all of "refs/heads/". Now here, since ref-filter parses the format string by making note of '%(' and ')', it accidentally thinks that I want to compare equality with "refs/heads/step-1" instead of "refs/heads/step-1)start", which I actually wanted. If my local repo contained both the "refs/heads/step1" and "refs/heads/step1)start", wouldn't this be a bug? So I do agree that it is a lack of quoting when entering the "value" part of "%(atom:someparam=value)", but another part of me also thinks that ref-filter should be intelligent enough while parsing the format string to acknowledge where exactly the atom ends and which is the last closing ')' and hence follows whatever I wrote below the "---" line, till the script. Also, I'm thinking the commit msg was not clear as it lead you (perhaps someone else too when they visit this topic) to think about escape literals while that not exactly is the problem I'm trying to get at. Also if you think a change to the documentation would be more proper than reflecting this with a test breakage, I'll do that. My intention with the test was that - in the future if we parse the "value" correctly then that commit would also include a s/test_expect_failure/test_expect_success change. Thanks!
Jeff King <peff@peff.net> writes: > I am tempted to say the solution is to expand that "equals" value, and > possibly add some less-arcane version of the character (maybe "%)"?). > But it be a break in backwards compatibility if somebody is trying to > match literal %-chars in their "if" block. If they were trying to write a literal %, wouldn't they be writing %% already, not because % followed by a byte without any special meaning happens to be passed intact by the implementation, but because that is _the_ right thing to do, when % is used as an introducer for escape sequences? So I do agree it would be a change that breaks backward compatibility but I do not think we want to stay bug to bug compatible with the current behaviour here. I am not sure with the wisdom of %) though. Wouldn't "%(foo %)" look as if %( opens and %) closes a group in our language? So I am very much in favor of this "if condition should be expanded before comparison" solution.
On Tue, Nov 05, 2024 at 07:05:13PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > I am tempted to say the solution is to expand that "equals" value, and > > possibly add some less-arcane version of the character (maybe "%)"?). > > But it be a break in backwards compatibility if somebody is trying to > > match literal %-chars in their "if" block. > > If they were trying to write a literal %, wouldn't they be writing > %% already, not because % followed by a byte without any special > meaning happens to be passed intact by the implementation, but > because that is _the_ right thing to do, when % is used as an > introducer for escape sequences? So I do agree it would be a change > that breaks backward compatibility but I do not think we want to > stay bug to bug compatible with the current behaviour here. I am > not sure with the wisdom of %) though. Wouldn't "%(foo %)" look as > if %( opens and %) closes a group in our language? > > So I am very much in favor of this "if condition should be expanded > before comparison" solution. I had worked on this "if condition should be expanded before comparison" solution but Christian and I agreed that it would be better to open up discussion incase there would be other possible and better solutions which would also be viable in the long term. This solution relies on how we parse out literals in ref-filter, so '%%' would work as intended in the comparision string - ie if we want to compare against "some%ref", we would do "%(if:equals=some%%ref)...". Also it is obscure that someone will use this in practice as I myself had discovered this when working on a corner case of some other implementation related to parsing but way lower in the call-chain of ref-filter ;) but here goes (this applies on top of the current patch) ------------------------ >8 ------------------------ Subject: [PATCH] ref-filter: parse parentheses correctly in %(if) atoms Having a ')' in "<string>" in ":equals=<string>" or ":notequals=<string>" wouldn't parse correctly in ref-filter as documented in the previous commit. One way to fix this is refactoring the way in which we parse our format string. Although this would mean we would have to do a huge refactoring as this step happens very high up in the call chain. Therefore, support including parenthesis characters in "<string>" by instead giving their hexcode equivalents - as a for-now hack. Do this by further abstracting "append_literal()" to "parse_literal()" where the output is no longer stored into a ref formatting stack's strbuf but a standard standalone strbuf. append_literal would then hence wrap appropriately around "parse_literal()". Also introduce "convert_hexcode()" which also wraps around "parse_literal()" and must be used to convert hexcode in a given string and be silent when such a string doesn't contain hexcode or doesn't exist (ie is NULL). Using "convert_hexcode()" would mean that we now have an alloced string - hence free() it once we are done with it to prevent any memory leaks. Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> --- Documentation/git-for-each-ref.txt | 4 ++ ref-filter.c | 76 +++++++++++++++++++++++------- t/t6300-for-each-ref.sh | 6 ++- 3 files changed, 66 insertions(+), 20 deletions(-) diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt index d3764401a2..ce12400040 100644 --- a/Documentation/git-for-each-ref.txt +++ b/Documentation/git-for-each-ref.txt @@ -221,6 +221,10 @@ if:: the value between the %(if:...) and %(then) atoms with the given string. + Additionally, if `<string>` must contain parenthesis, then these + parentheses are spelled out as hexcode. For e.g., `1)someref` + would need to be `1%29someref`. + symref:: The ref which the given symbolic ref refers to. If not a symbolic ref, nothing is printed. Respects the `:short`, diff --git a/ref-filter.c b/ref-filter.c index 84c6036107..ebdb2daeb7 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -1234,13 +1234,64 @@ static void if_then_else_handler(struct ref_formatting_stack **stack) *stack = cur; } +/* + * Parse out literals in the string pointed to by "cp" and store them in + * a strbuf - this would go on until we hit NUL or "ep". + * + * While at it, if they're of the form "%xx", where xx represents the + * hexcode of some character, then convert them into their equivalent + * characters. + */ +static void parse_literal(const char *cp, const char *ep, + struct strbuf *s) +{ + while (*cp && (!ep || cp < ep)) { + if (*cp == '%') { + if (cp[1] == '%') + cp++; + else { + int ch = hex2chr(cp + 1); + if (0 <= ch) { + strbuf_addch(s, ch); + cp += 3; + continue; + } + } + } + strbuf_addch(s, *cp); + cp++; + } +} + +/* + * Convert the string, pointed to by "cp", which might or might not + * contain hexcode (in the format of "%xx" where xx is the hexcode) to + * its character-equivalent string and return it. + * + * If the string does not contain any hexcode - then it is returned as + * is. + */ +static const char *convert_hexcode(const char *cp) +{ + struct strbuf s = STRBUF_INIT; + + if (!cp) + return NULL; + /* + * This has the effect of an in-place translation but + * implementation-wise it is not. + */ + parse_literal(cp, NULL, &s); + return strbuf_detach(&s, NULL); +} + static int if_atom_handler(struct atom_value *atomv, struct ref_formatting_state *state, struct strbuf *err UNUSED) { struct ref_formatting_stack *new_stack; struct if_then_else *if_then_else = xcalloc(1, sizeof(*if_then_else)); - if_then_else->str = atomv->atom->u.if_then_else.str; + if_then_else->str = convert_hexcode(atomv->atom->u.if_then_else.str); if_then_else->cmp_status = atomv->atom->u.if_then_else.cmp_status; push_stack_element(&state->stack); @@ -1296,6 +1347,9 @@ static int then_atom_handler(struct atom_value *atomv UNUSED, if_then_else->condition_satisfied = 1; } else if (cur->output.len && !is_empty(&cur->output)) if_then_else->condition_satisfied = 1; + + if (if_then_else->str) + free((char *)if_then_else->str); strbuf_reset(&cur->output); return 0; } @@ -3425,26 +3479,12 @@ void ref_array_sort(struct ref_sorting *sorting, struct ref_array *array) QSORT_S(array->items, array->nr, compare_refs, sorting); } -static void append_literal(const char *cp, const char *ep, struct ref_formatting_state *state) +static void append_literal(const char *cp, const char *ep, + struct ref_formatting_state *state) { struct strbuf *s = &state->stack->output; - while (*cp && (!ep || cp < ep)) { - if (*cp == '%') { - if (cp[1] == '%') - cp++; - else { - int ch = hex2chr(cp + 1); - if (0 <= ch) { - strbuf_addch(s, ch); - cp += 3; - continue; - } - } - } - strbuf_addch(s, *cp); - cp++; - } + parse_literal(cp, ep, s); } int format_ref_array_item(struct ref_array_item *info, diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index ce5c607193..c9383d23a4 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -2141,7 +2141,7 @@ test_expect_success GPG 'show lack of signature with custom format' ' test_cmp expect actual ' -test_expect_failure 'format having values containing ) parse correctly' ' +test_expect_success 'format having values containing ) parse correctly' ' git branch "1)feat" && cat >expect <<-\EOF && refs/heads/1)feat @@ -2151,7 +2151,9 @@ test_expect_failure 'format having values containing ) parse correctly' ' not equals not equals EOF - git for-each-ref --format="%(if:equals=1)feat)%(refname:short)%(then)%(refname)%(else)not equals%(end)" \ + + # 29 is the hexcode of ) + git for-each-ref --format="%(if:equals=1%29feat)%(refname:short)%(then)%(refname)%(else)not equals%(end)" \ refs/heads/ >actual && test_cmp expect actual '
On Tue, Nov 05, 2024 at 07:05:13PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > I am tempted to say the solution is to expand that "equals" value, and > > possibly add some less-arcane version of the character (maybe "%)"?). > > But it be a break in backwards compatibility if somebody is trying to > > match literal %-chars in their "if" block. > > If they were trying to write a literal %, wouldn't they be writing > %% already, not because % followed by a byte without any special > meaning happens to be passed intact by the implementation, but > because that is _the_ right thing to do, when % is used as an > introducer for escape sequences? So I do agree it would be a change > that breaks backward compatibility but I do not think we want to > stay bug to bug compatible with the current behaviour here. I think "because that is the right thing to do" is what is in question. It is not like we happen to allow "%", but you should be writing "%%" in an if:equals value already. They mean two different things, and anybody who is doing: %(if:equals=%%foo) to match the literal "%%foo" will be broken if we change that. They are not doing anything wrong; that is the only way to make it work now. I wouldn't go so far as to call the current behavior a bug. It's just...not very flexible. I also think it is unlikely that anybody would care in practice (though I find matching refs with ")" in them already a bit far-fetched). If we wanted to be extra careful, we could introduce a variant of "equals" that indicates that it will be expanded before comparison. Or even an extra tag, like: %(if:expand:equals=%%foo) > I am not sure with the wisdom of %) though. Wouldn't "%(foo %)" look > as if %( opens and %) closes a group in our language? Yeah, I agree it is ugly and possibly confusing. Normally I'd suggest "\" for escaping, but it isn't otherwise syntactically important within these formats (I don't think, anyway). The magic character is "%" so that is what we have to work with. -Peff
On Wed, Nov 06, 2024 at 09:24:08AM +0530, Kousik Sanagavarapu wrote: > One way to fix this is refactoring the way in which we parse our format > string. Although this would mean we would have to do a huge refactoring > as this step happens very high up in the call chain. > > Therefore, support including parenthesis characters in "<string>" by > instead giving their hexcode equivalents - as a for-now hack. So if I understand this is just expanding %<hex> and nothing else? That seems like the worst of both worlds. Now "%" is magic in these value strings, breaking compatibility, but we didn't buy ourselves the flexibility to do arbitrary comparisons like: %(if:equals=%(upstream:lstrip=3))%(refname:short)%(then)... -Peff
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index c39d4e7e9c..ce5c607193 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -2141,4 +2141,19 @@ test_expect_success GPG 'show lack of signature with custom format' ' test_cmp expect actual ' +test_expect_failure 'format having values containing ) parse correctly' ' + git branch "1)feat" && + cat >expect <<-\EOF && + refs/heads/1)feat + not equals + not equals + not equals + not equals + not equals + EOF + git for-each-ref --format="%(if:equals=1)feat)%(refname:short)%(then)%(refname)%(else)not equals%(end)" \ + refs/heads/ >actual && + test_cmp expect actual +' + test_done
Document that values containing ')' in formats are not parsed correctly in ref-filter. The problem here is that ref-filter, while parsing the format string, looks for the first occurence of ')' and marks it to be the end of _that_ particular atom - which is obviously wrong in cases where the format is of type atom:key=value where "value" has ')' somewhere in it. However formats having a '(' instead in "value" will parse correctly because in a general format string we also mark start of the format by making note of '%(' instead of just '('. For example, in %(if:equals=somere)f)%(refname:short)... the string that ref-filter should compare against is "somere)f", although since the parsing behavior in these cases is broken, we instead compare against "somere". While in %(if:equals=somere(f)%(refname:short)... ref-filter rightly compares against "somere(f" as expected. As a side note it should be mentioned that values containing ')' are legit in %(refname) (and other atoms like %(upstream)). Meaning the parser wouldn't err out such values as they are legal, which is also confirmed by - say the refname coming from $ git branch '1)branch' or a remote name coming from $ git remote add 'up)stream' 'some.url' $ git push --set-upstream 'up)stream' '1)branch' since none of these fail. Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> --- This raises the question of what can be done to parse ')' in values of the format correctly. It seems to me like a clean solution would involve a huge refactoring involving a large portion of ref-filter but I maybe wrong. So this patch also hopes to open up discussion on not only solving this bug but also how in general ref-filter currently parses and formats atoms and if it is the way in which we would like to do things in the future, which would in turn also be helpful in the long term goal of merging both pretty and ref-filter. Here is also a simple script to demonstrate the difference between '(' and ')' in values in formats - as described in the commit msg - #!/bin/sh rm -rf /tmp/atom-test-dir && # create env git init /tmp/atom-test-dir 1>/tmp/init-dump 2>>/tmp/init-dump && cd /tmp/atom-test-dir && echo "smtg" >file && git add file && git commit -s -m "initial revision" >/tmp/commit-dump && # using "(" in refname works good echo "test with refname as \"bran(ch\"" && git branch "bran(ch" && printf "bran(ch\n\n" >expect && git for-each-ref --format="%(if:equals=bran(ch)%(refname:short)%(then)%(refname:short)%(end)" refs/heads/ >actual && if ! diff -u expect actual; then echo " actual is different from expect" else echo " actual is the same as expect" fi echo "" && # using ")" in refname will parse wrong in ref-filter code echo "test with refname as \"bran()ch\"" && git branch "bran()ch" && printf "bran()ch\n\n\n" >expect && git for-each-ref --format="%(if:equals=bran()ch)%(refname:short)%(then)%(refname:short)%(end)" refs/heads/ >actual && if ! diff -u expect actual; then echo " actual is different from expect" else echo " actual is the same as expect" fi t/t6300-for-each-ref.sh | 15 +++++++++++++++ 1 file changed, 15 insertions(+)