diff mbox series

[1/5] ref-filter: support different email formats

Message ID aeb116c5aaaa23dfefbc7a6f4ac743a6f5a3ade8.1595882588.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Improvements to ref-filter | expand

Commit Message

Johannes Schindelin via GitGitGadget July 27, 2020, 8:43 p.m. UTC
From: Hariom Verma <hariom18599@gmail.com>

Currently, ref-filter only supports printing email with arrow brackets.

Let's add support for two more email options.
- trim : print email without arrow brackets.
- localpart : prints the part before the @ sign

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
---
 ref-filter.c            | 36 ++++++++++++++++++++++++++++++++----
 t/t6300-for-each-ref.sh | 16 ++++++++++++++++
 2 files changed, 48 insertions(+), 4 deletions(-)

Comments

Junio C Hamano July 27, 2020, 10:51 p.m. UTC | #1
"Hariom Verma via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Hariom Verma <hariom18599@gmail.com>
>
> Currently, ref-filter only supports printing email with arrow brackets.

This is the first time I heard the term "arrow bracket".  Aren't
they more commonly called angle brackets?

> Let's add support for two more email options.
> - trim : print email without arrow brackets.

Why would this be useful?

> - localpart : prints the part before the @ sign

Meaning I'd get "<gitster" for me?

Building small pieces of new feature in each patch is good, and
adding tests to each step is also good.  Why not do the same for
docs?

> +static struct email_option{

Missing SP.

> +	enum { EO_INVALID, EO_RAW, EO_TRIM, EO_LOCALPART } option;
> +} email_option;
> +
> @@ -1040,10 +1044,26 @@ static const char *copy_email(const char *buf)
>  	const char *eoemail;
>  	if (!email)
>  		return xstrdup("");
> -	eoemail = strchr(email, '>');

The original code prepares to see NULL from this strchr(); that is
why it checks eoemail for NULL and returns an empty string---the
data is broken (i.e. not an address in angle brackets), which this
code cannot do anything about---in the later part of the code.

> +	switch (email_option.option) {
> +	case EO_RAW:
> +		eoemail = strchr(email, '>') + 1;

And this breaks the carefully laid out error handling by the
original code.  Adding 1 to NULL is quite undefined.

> +		break;
> +	case EO_TRIM:
> +		email++;
> +		eoemail = strchr(email, '>');
> +		break;
> +	case EO_LOCALPART:
> +		email++;
> +		eoemail = strchr(email, '@');

The undocumented design here is that you want to return "hariom" for
"<hariom@gmail.com>", i.e. out of the "trimmed" e-mail, the part
before the at-sign is returned.

If the data were "<hariom>", you'd still want to return "hariom" no?
But because you do not check for NULL, you end up returning an empty
string.

I think you want to cut at the first '@' or '>', whichever comes
first.


> +		break;
> +	case EO_INVALID:
> +	default:

Invalid and unhandled ones are silently ignored and not treated as
an error?  I would have thought that at least the "default" one
would be a BUG(), as you covered all the possible values for the
enum with case arms.  I wouldn't be surprised if seeing EO_INVALID
is also a BUG(), i.e. the control flow that led to the caller to
call this function with EO_INVALID in email_option.option is likely
to be broken.  It's not like you return "" to protect yourself when
fed a bad data from objects---a bad value in .option can only come
here if the parser you wrote for "--format=<string>" produced a
wrong result.

> +		return xstrdup("");
> +	}
> +
>  	if (!eoemail)
>  		return xstrdup("");
> -	return xmemdupz(email, eoemail + 1 - email);
> +	return xmemdupz(email, eoemail - email);
>  }
>  
>  static char *copy_subject(const char *buf, unsigned long len)
> @@ -1113,7 +1133,7 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
>  			continue;
>  		if (name[wholen] != 0 &&
>  		    strcmp(name + wholen, "name") &&
> -		    strcmp(name + wholen, "email") &&
> +		    !starts_with(name + wholen, "email") &&
>  		    !starts_with(name + wholen, "date"))
>  			continue;
>  		if (!wholine)
> @@ -1124,8 +1144,16 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
>  			v->s = copy_line(wholine);
>  		else if (!strcmp(name + wholen, "name"))
>  			v->s = copy_name(wholine);
> -		else if (!strcmp(name + wholen, "email"))
> +		else if (starts_with(name + wholen, "email")) {
> +			email_option.option = EO_INVALID;
> +			if (!strcmp(name + wholen, "email"))
> +				email_option.option = EO_RAW;
> +			if (!strcmp(name + wholen, "email:trim"))
> +				email_option.option = EO_TRIM;
> +			if (!strcmp(name + wholen, "email:localpart"))
> +				email_option.option = EO_LOCALPART;

The ref-filter formatting language already knows many "colon plus
modifier" suffix like "refname:short" and "contents:body", but I do
not think we have ugly repetition like the above code to parse them.
Perhaps the addition for "email:<whatever>" can benefit from
studying and mimicking existing practices a bit more?
Đoàn Trần Công Danh July 28, 2020, 1:58 p.m. UTC | #2
[this is a resent, my previous mail couldn't reach the archive]

On 2020-07-27 20:43:04+0000, Hariom Verma via GitGitGadget <gitgitgadget@gmail.com> wrote:
> From: Hariom Verma <hariom18599@gmail.com>
> 
> Currently, ref-filter only supports printing email with arrow brackets.
> 
> Let's add support for two more email options.
> - trim : print email without arrow brackets.
> - localpart : prints the part before the @ sign
> 
> Mentored-by: Christian Couder <chriscool@tuxfamily.org>
> Mentored-by: Heba Waly <heba.waly@gmail.com>
> Signed-off-by: Hariom Verma <hariom18599@gmail.com>
> ---
>  ref-filter.c            | 36 ++++++++++++++++++++++++++++++++----
>  t/t6300-for-each-ref.sh | 16 ++++++++++++++++
>  2 files changed, 48 insertions(+), 4 deletions(-)
> 
> diff --git a/ref-filter.c b/ref-filter.c
> index 8447cb09be..8563088eb1 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -102,6 +102,10 @@ static struct ref_to_worktree_map {
>  	struct worktree **worktrees;
>  } ref_to_worktree_map;
>  
> +static struct email_option{
> +	enum { EO_INVALID, EO_RAW, EO_TRIM, EO_LOCALPART } option;
> +} email_option;
> +
>  /*
>   * An atom is a valid field atom listed below, possibly prefixed with
>   * a "*" to denote deref_tag().
> @@ -1040,10 +1044,26 @@ static const char *copy_email(const char *buf)
>  	const char *eoemail;
>  	if (!email)
>  		return xstrdup("");
> -	eoemail = strchr(email, '>');
> +	switch (email_option.option) {
> +	case EO_RAW:
> +		eoemail = strchr(email, '>') + 1;
> +		break;
> +	case EO_TRIM:
> +		email++;
> +		eoemail = strchr(email, '>');
> +		break;
> +	case EO_LOCALPART:
> +		email++;
> +		eoemail = strchr(email, '@');
> +		break;


This is not correct.
RFC-822 allows @ in local part,
albeit, that localpart must be quoted:

        addr-spec       =       local-part "@" domain
        local-part      =       dot-atom / quoted-string / obs-local-part
        quoted-string   =       [CFWS]
                                DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                                [CFWS]
        qcontent        =       qtext / quoted-pair
        qtext           =       NO-WS-CTL /     ; Non white space
        qtext           =       NO-WS-CTL /     ; Non white space controls
                                %d33 /          ; The rest of the US-ASCII
                                %d35-91 /       ;  characters not including "\"
                                %d93-126        ;  or the quote character
        quoted-pair     =       ("\" text) / obs-qp

IOW, those below email addresses are valid email address,
and the local part is `quoted@local'

        "quoted@local"@example.com
        quoted\@local@example.com

Anyway, it seems like current Git strips first `"'
from `"quoted@local"@example.com'
Junio C Hamano July 28, 2020, 4:45 p.m. UTC | #3
Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:

> This is not correct.
> RFC-822 allows @ in local part,
> albeit, that localpart must be quoted:

We do care about truly local e-mail addresses, without '@' anywhere
inside <braket>, simply because they are common in the result of SCM
conversion from CVS/SVN.

But I do not think we are pedantic/academic enough to have cared
about any local part that is unusual enough to require quoting; we
instead relied on the fact that we live in real world with practical
people who would avoid such an address ;-).
Hariom verma July 28, 2020, 8:31 p.m. UTC | #4
Hi,

On Tue, Jul 28, 2020 at 4:21 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Hariom Verma via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Hariom Verma <hariom18599@gmail.com>
> >
> > Currently, ref-filter only supports printing email with arrow brackets.
>
> This is the first time I heard the term "arrow bracket".  Aren't
> they more commonly called angle brackets?

Yeah. Sorry about that.

> > Let's add support for two more email options.
> > - trim : print email without arrow brackets.
>
> Why would this be useful?

It might be useful for using the ref-filter's logic in pretty.c
(especially for `--pretty` formats like `%an` and `%cn`)

> > - localpart : prints the part before the @ sign
>
> Meaning I'd get "<gitster" for me?

No, you'll get "gitster".

> Building small pieces of new feature in each patch is good, and
> adding tests to each step is also good.  Why not do the same for
> docs?

Yeah, I agree with you. I should have focused on documentation too.

> > +static struct email_option{
>
> Missing SP.

I'll fix it.

> > +     enum { EO_INVALID, EO_RAW, EO_TRIM, EO_LOCALPART } option;
> > +} email_option;
> > +
> > @@ -1040,10 +1044,26 @@ static const char *copy_email(const char *buf)
> >       const char *eoemail;
> >       if (!email)
> >               return xstrdup("");
> > -     eoemail = strchr(email, '>');
>
> The original code prepares to see NULL from this strchr(); that is
> why it checks eoemail for NULL and returns an empty string---the
> data is broken (i.e. not an address in angle brackets), which this
> code cannot do anything about---in the later part of the code.

I think this commit still takes care of NULL.
After the switch-case statements, code consists of:
```
if (!eoemail)
    return xstrdup("");
```
Which checks eoemail for NULL. And will return empty string if address
is not in angle brackets.
Same applies for local-part too. If '@' is not present in email
address, it will still return empty string.

> > +     switch (email_option.option) {
> > +     case EO_RAW:
> > +             eoemail = strchr(email, '>') + 1;
>
> And this breaks the carefully laid out error handling by the
> original code.  Adding 1 to NULL is quite undefined.

Yeah. I'll take care of it in the next version.

> > +             break;
> > +     case EO_TRIM:
> > +             email++;
> > +             eoemail = strchr(email, '>');
> > +             break;
> > +     case EO_LOCALPART:
> > +             email++;
> > +             eoemail = strchr(email, '@');
>
> The undocumented design here is that you want to return "hariom" for
> "<hariom@gmail.com>", i.e. out of the "trimmed" e-mail, the part
> before the at-sign is returned.
>
> If the data were "<hariom>", you'd still want to return "hariom" no?
> But because you do not check for NULL, you end up returning an empty
> string.

I never heard of email address without '@' symbol. Thats why I
returned empty string.

Will fix that too.

> I think you want to cut at the first '@' or '>', whichever comes
> first.

If email data can be without '@' symbol, then I guess "yes".

> > +             break;
> > +     case EO_INVALID:
> > +     default:
>
> Invalid and unhandled ones are silently ignored and not treated as
> an error?  I would have thought that at least the "default" one
> would be a BUG(), as you covered all the possible values for the
> enum with case arms.  I wouldn't be surprised if seeing EO_INVALID
> is also a BUG(), i.e. the control flow that led to the caller to
> call this function with EO_INVALID in email_option.option is likely
> to be broken.  It's not like you return "" to protect yourself when
> fed a bad data from objects---a bad value in .option can only come
> here if the parser you wrote for "--format=<string>" produced a
> wrong result.

Christian <chriscool@tuxfamily.org> also suggested me the same. Will
fix this too.

BTW, on master "{author,committer,tagger}email:<xyz>" does not print any error.

> > +             return xstrdup("");
> > +     }
> > +
> >       if (!eoemail)
> >               return xstrdup("");
> > -     return xmemdupz(email, eoemail + 1 - email);
> > +     return xmemdupz(email, eoemail - email);
> >  }
> >
> >  static char *copy_subject(const char *buf, unsigned long len)
> > @@ -1113,7 +1133,7 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
> >                       continue;
> >               if (name[wholen] != 0 &&
> >                   strcmp(name + wholen, "name") &&
> > -                 strcmp(name + wholen, "email") &&
> > +                 !starts_with(name + wholen, "email") &&
> >                   !starts_with(name + wholen, "date"))
> >                       continue;
> >               if (!wholine)
> > @@ -1124,8 +1144,16 @@ static void grab_person(const char *who, struct atom_value *val, int deref, void
> >                       v->s = copy_line(wholine);
> >               else if (!strcmp(name + wholen, "name"))
> >                       v->s = copy_name(wholine);
> > -             else if (!strcmp(name + wholen, "email"))
> > +             else if (starts_with(name + wholen, "email")) {
> > +                     email_option.option = EO_INVALID;
> > +                     if (!strcmp(name + wholen, "email"))
> > +                             email_option.option = EO_RAW;
> > +                     if (!strcmp(name + wholen, "email:trim"))
> > +                             email_option.option = EO_TRIM;
> > +                     if (!strcmp(name + wholen, "email:localpart"))
> > +                             email_option.option = EO_LOCALPART;
>
> The ref-filter formatting language already knows many "colon plus
> modifier" suffix like "refname:short" and "contents:body", but I do
> not think we have ugly repetition like the above code to parse them.
> Perhaps the addition for "email:<whatever>" can benefit from
> studying and mimicking existing practices a bit more?
>

For "email:<whatever>",
even If I parse that <whatever>. I still make comparison something like:
```
if (!modifier)
    email_option.option = EO_RAW;
else if (!strcmp(modifier, "trim"))
    email_option.option = EO_TRIM;
else if (!strcmp(arg, "localpart"))
    email_option.option = EO_LOCALPART;
```

Thanks,
Hariom
Junio C Hamano July 28, 2020, 8:43 p.m. UTC | #5
Hariom verma <hariom18599@gmail.com> writes:

>> The ref-filter formatting language already knows many "colon plus
>> modifier" suffix like "refname:short" and "contents:body", but I do
>> not think we have ugly repetition like the above code to parse them.
>> Perhaps the addition for "email:<whatever>" can benefit from
>> studying and mimicking existing practices a bit more?
>>
>
> For "email:<whatever>",
> even If I parse that <whatever>. I still make comparison something like:
> ```
> if (!modifier)
>     email_option.option = EO_RAW;
> else if (!strcmp(modifier, "trim"))
>     email_option.option = EO_TRIM;
> else if (!strcmp(arg, "localpart"))
>     email_option.option = EO_LOCALPART;
> ```

Somebody needs to do a comparison, but it should be done at parsing
phase when the --format is grokked, not in grab phase that is run
for each and every ref to be shown.

These patches should only be done after looking at existing
"<basicatom>:<modifiers>" like "objectname:short" etc are handled,
not before, I think.
diff mbox series

Patch

diff --git a/ref-filter.c b/ref-filter.c
index 8447cb09be..8563088eb1 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -102,6 +102,10 @@  static struct ref_to_worktree_map {
 	struct worktree **worktrees;
 } ref_to_worktree_map;
 
+static struct email_option{
+	enum { EO_INVALID, EO_RAW, EO_TRIM, EO_LOCALPART } option;
+} email_option;
+
 /*
  * An atom is a valid field atom listed below, possibly prefixed with
  * a "*" to denote deref_tag().
@@ -1040,10 +1044,26 @@  static const char *copy_email(const char *buf)
 	const char *eoemail;
 	if (!email)
 		return xstrdup("");
-	eoemail = strchr(email, '>');
+	switch (email_option.option) {
+	case EO_RAW:
+		eoemail = strchr(email, '>') + 1;
+		break;
+	case EO_TRIM:
+		email++;
+		eoemail = strchr(email, '>');
+		break;
+	case EO_LOCALPART:
+		email++;
+		eoemail = strchr(email, '@');
+		break;
+	case EO_INVALID:
+	default:
+		return xstrdup("");
+	}
+
 	if (!eoemail)
 		return xstrdup("");
-	return xmemdupz(email, eoemail + 1 - email);
+	return xmemdupz(email, eoemail - email);
 }
 
 static char *copy_subject(const char *buf, unsigned long len)
@@ -1113,7 +1133,7 @@  static void grab_person(const char *who, struct atom_value *val, int deref, void
 			continue;
 		if (name[wholen] != 0 &&
 		    strcmp(name + wholen, "name") &&
-		    strcmp(name + wholen, "email") &&
+		    !starts_with(name + wholen, "email") &&
 		    !starts_with(name + wholen, "date"))
 			continue;
 		if (!wholine)
@@ -1124,8 +1144,16 @@  static void grab_person(const char *who, struct atom_value *val, int deref, void
 			v->s = copy_line(wholine);
 		else if (!strcmp(name + wholen, "name"))
 			v->s = copy_name(wholine);
-		else if (!strcmp(name + wholen, "email"))
+		else if (starts_with(name + wholen, "email")) {
+			email_option.option = EO_INVALID;
+			if (!strcmp(name + wholen, "email"))
+				email_option.option = EO_RAW;
+			if (!strcmp(name + wholen, "email:trim"))
+				email_option.option = EO_TRIM;
+			if (!strcmp(name + wholen, "email:localpart"))
+				email_option.option = EO_LOCALPART;
 			v->s = copy_email(wholine);
+		}
 		else if (starts_with(name + wholen, "date"))
 			grab_date(wholine, v, name);
 	}
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index da59fadc5d..b8a2cb8439 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -106,15 +106,21 @@  test_atom head '*objecttype' ''
 test_atom head author 'A U Thor <author@example.com> 1151968724 +0200'
 test_atom head authorname 'A U Thor'
 test_atom head authoremail '<author@example.com>'
+test_atom head authoremail:trim 'author@example.com'
+test_atom head authoremail:localpart 'author'
 test_atom head authordate 'Tue Jul 4 01:18:44 2006 +0200'
 test_atom head committer 'C O Mitter <committer@example.com> 1151968723 +0200'
 test_atom head committername 'C O Mitter'
 test_atom head committeremail '<committer@example.com>'
+test_atom head committeremail:trim 'committer@example.com'
+test_atom head committeremail:localpart 'committer'
 test_atom head committerdate 'Tue Jul 4 01:18:43 2006 +0200'
 test_atom head tag ''
 test_atom head tagger ''
 test_atom head taggername ''
 test_atom head taggeremail ''
+test_atom head taggeremail:trim ''
+test_atom head taggeremail:localpart ''
 test_atom head taggerdate ''
 test_atom head creator 'C O Mitter <committer@example.com> 1151968723 +0200'
 test_atom head creatordate 'Tue Jul 4 01:18:43 2006 +0200'
@@ -151,15 +157,21 @@  test_atom tag '*objecttype' 'commit'
 test_atom tag author ''
 test_atom tag authorname ''
 test_atom tag authoremail ''
+test_atom tag authoremail:trim ''
+test_atom tag authoremail:localpart ''
 test_atom tag authordate ''
 test_atom tag committer ''
 test_atom tag committername ''
 test_atom tag committeremail ''
+test_atom tag committeremail:trim ''
+test_atom tag committeremail:localpart ''
 test_atom tag committerdate ''
 test_atom tag tag 'testtag'
 test_atom tag tagger 'C O Mitter <committer@example.com> 1151968725 +0200'
 test_atom tag taggername 'C O Mitter'
 test_atom tag taggeremail '<committer@example.com>'
+test_atom tag taggeremail:trim 'committer@example.com'
+test_atom tag taggeremail:localpart 'committer'
 test_atom tag taggerdate 'Tue Jul 4 01:18:45 2006 +0200'
 test_atom tag creator 'C O Mitter <committer@example.com> 1151968725 +0200'
 test_atom tag creatordate 'Tue Jul 4 01:18:45 2006 +0200'
@@ -545,10 +557,14 @@  test_atom refs/tags/taggerless tag 'taggerless'
 test_atom refs/tags/taggerless tagger ''
 test_atom refs/tags/taggerless taggername ''
 test_atom refs/tags/taggerless taggeremail ''
+test_atom refs/tags/taggerless taggeremail:trim ''
+test_atom refs/tags/taggerless taggeremail:localpart ''
 test_atom refs/tags/taggerless taggerdate ''
 test_atom refs/tags/taggerless committer ''
 test_atom refs/tags/taggerless committername ''
 test_atom refs/tags/taggerless committeremail ''
+test_atom refs/tags/taggerless committeremail:trim ''
+test_atom refs/tags/taggerless committeremail:localpart ''
 test_atom refs/tags/taggerless committerdate ''
 test_atom refs/tags/taggerless subject 'Broken tag'