Message ID | aeb116c5aaaa23dfefbc7a6f4ac743a6f5a3ade8.1595882588.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improvements to ref-filter | expand |
"Hariom Verma via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Hariom Verma <hariom18599@gmail.com> > > Currently, ref-filter only supports printing email with arrow brackets. This is the first time I heard the term "arrow bracket". Aren't they more commonly called angle brackets? > Let's add support for two more email options. > - trim : print email without arrow brackets. Why would this be useful? > - localpart : prints the part before the @ sign Meaning I'd get "<gitster" for me? Building small pieces of new feature in each patch is good, and adding tests to each step is also good. Why not do the same for docs? > +static struct email_option{ Missing SP. > + enum { EO_INVALID, EO_RAW, EO_TRIM, EO_LOCALPART } option; > +} email_option; > + > @@ -1040,10 +1044,26 @@ static const char *copy_email(const char *buf) > const char *eoemail; > if (!email) > return xstrdup(""); > - eoemail = strchr(email, '>'); The original code prepares to see NULL from this strchr(); that is why it checks eoemail for NULL and returns an empty string---the data is broken (i.e. not an address in angle brackets), which this code cannot do anything about---in the later part of the code. > + switch (email_option.option) { > + case EO_RAW: > + eoemail = strchr(email, '>') + 1; And this breaks the carefully laid out error handling by the original code. Adding 1 to NULL is quite undefined. > + break; > + case EO_TRIM: > + email++; > + eoemail = strchr(email, '>'); > + break; > + case EO_LOCALPART: > + email++; > + eoemail = strchr(email, '@'); The undocumented design here is that you want to return "hariom" for "<hariom@gmail.com>", i.e. out of the "trimmed" e-mail, the part before the at-sign is returned. If the data were "<hariom>", you'd still want to return "hariom" no? But because you do not check for NULL, you end up returning an empty string. I think you want to cut at the first '@' or '>', whichever comes first. > + break; > + case EO_INVALID: > + default: Invalid and unhandled ones are silently ignored and not treated as an error? I would have thought that at least the "default" one would be a BUG(), as you covered all the possible values for the enum with case arms. I wouldn't be surprised if seeing EO_INVALID is also a BUG(), i.e. the control flow that led to the caller to call this function with EO_INVALID in email_option.option is likely to be broken. It's not like you return "" to protect yourself when fed a bad data from objects---a bad value in .option can only come here if the parser you wrote for "--format=<string>" produced a wrong result. > + return xstrdup(""); > + } > + > if (!eoemail) > return xstrdup(""); > - return xmemdupz(email, eoemail + 1 - email); > + return xmemdupz(email, eoemail - email); > } > > static char *copy_subject(const char *buf, unsigned long len) > @@ -1113,7 +1133,7 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void > continue; > if (name[wholen] != 0 && > strcmp(name + wholen, "name") && > - strcmp(name + wholen, "email") && > + !starts_with(name + wholen, "email") && > !starts_with(name + wholen, "date")) > continue; > if (!wholine) > @@ -1124,8 +1144,16 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void > v->s = copy_line(wholine); > else if (!strcmp(name + wholen, "name")) > v->s = copy_name(wholine); > - else if (!strcmp(name + wholen, "email")) > + else if (starts_with(name + wholen, "email")) { > + email_option.option = EO_INVALID; > + if (!strcmp(name + wholen, "email")) > + email_option.option = EO_RAW; > + if (!strcmp(name + wholen, "email:trim")) > + email_option.option = EO_TRIM; > + if (!strcmp(name + wholen, "email:localpart")) > + email_option.option = EO_LOCALPART; The ref-filter formatting language already knows many "colon plus modifier" suffix like "refname:short" and "contents:body", but I do not think we have ugly repetition like the above code to parse them. Perhaps the addition for "email:<whatever>" can benefit from studying and mimicking existing practices a bit more?
[this is a resent, my previous mail couldn't reach the archive] On 2020-07-27 20:43:04+0000, Hariom Verma via GitGitGadget <gitgitgadget@gmail.com> wrote: > From: Hariom Verma <hariom18599@gmail.com> > > Currently, ref-filter only supports printing email with arrow brackets. > > Let's add support for two more email options. > - trim : print email without arrow brackets. > - localpart : prints the part before the @ sign > > Mentored-by: Christian Couder <chriscool@tuxfamily.org> > Mentored-by: Heba Waly <heba.waly@gmail.com> > Signed-off-by: Hariom Verma <hariom18599@gmail.com> > --- > ref-filter.c | 36 ++++++++++++++++++++++++++++++++---- > t/t6300-for-each-ref.sh | 16 ++++++++++++++++ > 2 files changed, 48 insertions(+), 4 deletions(-) > > diff --git a/ref-filter.c b/ref-filter.c > index 8447cb09be..8563088eb1 100644 > --- a/ref-filter.c > +++ b/ref-filter.c > @@ -102,6 +102,10 @@ static struct ref_to_worktree_map { > struct worktree **worktrees; > } ref_to_worktree_map; > > +static struct email_option{ > + enum { EO_INVALID, EO_RAW, EO_TRIM, EO_LOCALPART } option; > +} email_option; > + > /* > * An atom is a valid field atom listed below, possibly prefixed with > * a "*" to denote deref_tag(). > @@ -1040,10 +1044,26 @@ static const char *copy_email(const char *buf) > const char *eoemail; > if (!email) > return xstrdup(""); > - eoemail = strchr(email, '>'); > + switch (email_option.option) { > + case EO_RAW: > + eoemail = strchr(email, '>') + 1; > + break; > + case EO_TRIM: > + email++; > + eoemail = strchr(email, '>'); > + break; > + case EO_LOCALPART: > + email++; > + eoemail = strchr(email, '@'); > + break; This is not correct. RFC-822 allows @ in local part, albeit, that localpart must be quoted: addr-spec = local-part "@" domain local-part = dot-atom / quoted-string / obs-local-part quoted-string = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS] qcontent = qtext / quoted-pair qtext = NO-WS-CTL / ; Non white space qtext = NO-WS-CTL / ; Non white space controls %d33 / ; The rest of the US-ASCII %d35-91 / ; characters not including "\" %d93-126 ; or the quote character quoted-pair = ("\" text) / obs-qp IOW, those below email addresses are valid email address, and the local part is `quoted@local' "quoted@local"@example.com quoted\@local@example.com Anyway, it seems like current Git strips first `"' from `"quoted@local"@example.com'
Đoàn Trần Công Danh <congdanhqx@gmail.com> writes: > This is not correct. > RFC-822 allows @ in local part, > albeit, that localpart must be quoted: We do care about truly local e-mail addresses, without '@' anywhere inside <braket>, simply because they are common in the result of SCM conversion from CVS/SVN. But I do not think we are pedantic/academic enough to have cared about any local part that is unusual enough to require quoting; we instead relied on the fact that we live in real world with practical people who would avoid such an address ;-).
Hi, On Tue, Jul 28, 2020 at 4:21 AM Junio C Hamano <gitster@pobox.com> wrote: > > "Hariom Verma via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Hariom Verma <hariom18599@gmail.com> > > > > Currently, ref-filter only supports printing email with arrow brackets. > > This is the first time I heard the term "arrow bracket". Aren't > they more commonly called angle brackets? Yeah. Sorry about that. > > Let's add support for two more email options. > > - trim : print email without arrow brackets. > > Why would this be useful? It might be useful for using the ref-filter's logic in pretty.c (especially for `--pretty` formats like `%an` and `%cn`) > > - localpart : prints the part before the @ sign > > Meaning I'd get "<gitster" for me? No, you'll get "gitster". > Building small pieces of new feature in each patch is good, and > adding tests to each step is also good. Why not do the same for > docs? Yeah, I agree with you. I should have focused on documentation too. > > +static struct email_option{ > > Missing SP. I'll fix it. > > + enum { EO_INVALID, EO_RAW, EO_TRIM, EO_LOCALPART } option; > > +} email_option; > > + > > @@ -1040,10 +1044,26 @@ static const char *copy_email(const char *buf) > > const char *eoemail; > > if (!email) > > return xstrdup(""); > > - eoemail = strchr(email, '>'); > > The original code prepares to see NULL from this strchr(); that is > why it checks eoemail for NULL and returns an empty string---the > data is broken (i.e. not an address in angle brackets), which this > code cannot do anything about---in the later part of the code. I think this commit still takes care of NULL. After the switch-case statements, code consists of: ``` if (!eoemail) return xstrdup(""); ``` Which checks eoemail for NULL. And will return empty string if address is not in angle brackets. Same applies for local-part too. If '@' is not present in email address, it will still return empty string. > > + switch (email_option.option) { > > + case EO_RAW: > > + eoemail = strchr(email, '>') + 1; > > And this breaks the carefully laid out error handling by the > original code. Adding 1 to NULL is quite undefined. Yeah. I'll take care of it in the next version. > > + break; > > + case EO_TRIM: > > + email++; > > + eoemail = strchr(email, '>'); > > + break; > > + case EO_LOCALPART: > > + email++; > > + eoemail = strchr(email, '@'); > > The undocumented design here is that you want to return "hariom" for > "<hariom@gmail.com>", i.e. out of the "trimmed" e-mail, the part > before the at-sign is returned. > > If the data were "<hariom>", you'd still want to return "hariom" no? > But because you do not check for NULL, you end up returning an empty > string. I never heard of email address without '@' symbol. Thats why I returned empty string. Will fix that too. > I think you want to cut at the first '@' or '>', whichever comes > first. If email data can be without '@' symbol, then I guess "yes". > > + break; > > + case EO_INVALID: > > + default: > > Invalid and unhandled ones are silently ignored and not treated as > an error? I would have thought that at least the "default" one > would be a BUG(), as you covered all the possible values for the > enum with case arms. I wouldn't be surprised if seeing EO_INVALID > is also a BUG(), i.e. the control flow that led to the caller to > call this function with EO_INVALID in email_option.option is likely > to be broken. It's not like you return "" to protect yourself when > fed a bad data from objects---a bad value in .option can only come > here if the parser you wrote for "--format=<string>" produced a > wrong result. Christian <chriscool@tuxfamily.org> also suggested me the same. Will fix this too. BTW, on master "{author,committer,tagger}email:<xyz>" does not print any error. > > + return xstrdup(""); > > + } > > + > > if (!eoemail) > > return xstrdup(""); > > - return xmemdupz(email, eoemail + 1 - email); > > + return xmemdupz(email, eoemail - email); > > } > > > > static char *copy_subject(const char *buf, unsigned long len) > > @@ -1113,7 +1133,7 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void > > continue; > > if (name[wholen] != 0 && > > strcmp(name + wholen, "name") && > > - strcmp(name + wholen, "email") && > > + !starts_with(name + wholen, "email") && > > !starts_with(name + wholen, "date")) > > continue; > > if (!wholine) > > @@ -1124,8 +1144,16 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void > > v->s = copy_line(wholine); > > else if (!strcmp(name + wholen, "name")) > > v->s = copy_name(wholine); > > - else if (!strcmp(name + wholen, "email")) > > + else if (starts_with(name + wholen, "email")) { > > + email_option.option = EO_INVALID; > > + if (!strcmp(name + wholen, "email")) > > + email_option.option = EO_RAW; > > + if (!strcmp(name + wholen, "email:trim")) > > + email_option.option = EO_TRIM; > > + if (!strcmp(name + wholen, "email:localpart")) > > + email_option.option = EO_LOCALPART; > > The ref-filter formatting language already knows many "colon plus > modifier" suffix like "refname:short" and "contents:body", but I do > not think we have ugly repetition like the above code to parse them. > Perhaps the addition for "email:<whatever>" can benefit from > studying and mimicking existing practices a bit more? > For "email:<whatever>", even If I parse that <whatever>. I still make comparison something like: ``` if (!modifier) email_option.option = EO_RAW; else if (!strcmp(modifier, "trim")) email_option.option = EO_TRIM; else if (!strcmp(arg, "localpart")) email_option.option = EO_LOCALPART; ``` Thanks, Hariom
Hariom verma <hariom18599@gmail.com> writes: >> The ref-filter formatting language already knows many "colon plus >> modifier" suffix like "refname:short" and "contents:body", but I do >> not think we have ugly repetition like the above code to parse them. >> Perhaps the addition for "email:<whatever>" can benefit from >> studying and mimicking existing practices a bit more? >> > > For "email:<whatever>", > even If I parse that <whatever>. I still make comparison something like: > ``` > if (!modifier) > email_option.option = EO_RAW; > else if (!strcmp(modifier, "trim")) > email_option.option = EO_TRIM; > else if (!strcmp(arg, "localpart")) > email_option.option = EO_LOCALPART; > ``` Somebody needs to do a comparison, but it should be done at parsing phase when the --format is grokked, not in grab phase that is run for each and every ref to be shown. These patches should only be done after looking at existing "<basicatom>:<modifiers>" like "objectname:short" etc are handled, not before, I think.
diff --git a/ref-filter.c b/ref-filter.c index 8447cb09be..8563088eb1 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -102,6 +102,10 @@ static struct ref_to_worktree_map { struct worktree **worktrees; } ref_to_worktree_map; +static struct email_option{ + enum { EO_INVALID, EO_RAW, EO_TRIM, EO_LOCALPART } option; +} email_option; + /* * An atom is a valid field atom listed below, possibly prefixed with * a "*" to denote deref_tag(). @@ -1040,10 +1044,26 @@ static const char *copy_email(const char *buf) const char *eoemail; if (!email) return xstrdup(""); - eoemail = strchr(email, '>'); + switch (email_option.option) { + case EO_RAW: + eoemail = strchr(email, '>') + 1; + break; + case EO_TRIM: + email++; + eoemail = strchr(email, '>'); + break; + case EO_LOCALPART: + email++; + eoemail = strchr(email, '@'); + break; + case EO_INVALID: + default: + return xstrdup(""); + } + if (!eoemail) return xstrdup(""); - return xmemdupz(email, eoemail + 1 - email); + return xmemdupz(email, eoemail - email); } static char *copy_subject(const char *buf, unsigned long len) @@ -1113,7 +1133,7 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void continue; if (name[wholen] != 0 && strcmp(name + wholen, "name") && - strcmp(name + wholen, "email") && + !starts_with(name + wholen, "email") && !starts_with(name + wholen, "date")) continue; if (!wholine) @@ -1124,8 +1144,16 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void v->s = copy_line(wholine); else if (!strcmp(name + wholen, "name")) v->s = copy_name(wholine); - else if (!strcmp(name + wholen, "email")) + else if (starts_with(name + wholen, "email")) { + email_option.option = EO_INVALID; + if (!strcmp(name + wholen, "email")) + email_option.option = EO_RAW; + if (!strcmp(name + wholen, "email:trim")) + email_option.option = EO_TRIM; + if (!strcmp(name + wholen, "email:localpart")) + email_option.option = EO_LOCALPART; v->s = copy_email(wholine); + } else if (starts_with(name + wholen, "date")) grab_date(wholine, v, name); } diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index da59fadc5d..b8a2cb8439 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -106,15 +106,21 @@ test_atom head '*objecttype' '' test_atom head author 'A U Thor <author@example.com> 1151968724 +0200' test_atom head authorname 'A U Thor' test_atom head authoremail '<author@example.com>' +test_atom head authoremail:trim 'author@example.com' +test_atom head authoremail:localpart 'author' test_atom head authordate 'Tue Jul 4 01:18:44 2006 +0200' test_atom head committer 'C O Mitter <committer@example.com> 1151968723 +0200' test_atom head committername 'C O Mitter' test_atom head committeremail '<committer@example.com>' +test_atom head committeremail:trim 'committer@example.com' +test_atom head committeremail:localpart 'committer' test_atom head committerdate 'Tue Jul 4 01:18:43 2006 +0200' test_atom head tag '' test_atom head tagger '' test_atom head taggername '' test_atom head taggeremail '' +test_atom head taggeremail:trim '' +test_atom head taggeremail:localpart '' test_atom head taggerdate '' test_atom head creator 'C O Mitter <committer@example.com> 1151968723 +0200' test_atom head creatordate 'Tue Jul 4 01:18:43 2006 +0200' @@ -151,15 +157,21 @@ test_atom tag '*objecttype' 'commit' test_atom tag author '' test_atom tag authorname '' test_atom tag authoremail '' +test_atom tag authoremail:trim '' +test_atom tag authoremail:localpart '' test_atom tag authordate '' test_atom tag committer '' test_atom tag committername '' test_atom tag committeremail '' +test_atom tag committeremail:trim '' +test_atom tag committeremail:localpart '' test_atom tag committerdate '' test_atom tag tag 'testtag' test_atom tag tagger 'C O Mitter <committer@example.com> 1151968725 +0200' test_atom tag taggername 'C O Mitter' test_atom tag taggeremail '<committer@example.com>' +test_atom tag taggeremail:trim 'committer@example.com' +test_atom tag taggeremail:localpart 'committer' test_atom tag taggerdate 'Tue Jul 4 01:18:45 2006 +0200' test_atom tag creator 'C O Mitter <committer@example.com> 1151968725 +0200' test_atom tag creatordate 'Tue Jul 4 01:18:45 2006 +0200' @@ -545,10 +557,14 @@ test_atom refs/tags/taggerless tag 'taggerless' test_atom refs/tags/taggerless tagger '' test_atom refs/tags/taggerless taggername '' test_atom refs/tags/taggerless taggeremail '' +test_atom refs/tags/taggerless taggeremail:trim '' +test_atom refs/tags/taggerless taggeremail:localpart '' test_atom refs/tags/taggerless taggerdate '' test_atom refs/tags/taggerless committer '' test_atom refs/tags/taggerless committername '' test_atom refs/tags/taggerless committeremail '' +test_atom refs/tags/taggerless committeremail:trim '' +test_atom refs/tags/taggerless committeremail:localpart '' test_atom refs/tags/taggerless committerdate '' test_atom refs/tags/taggerless subject 'Broken tag'