diff mbox series

[2/2] pretty: add '%aA' to show domain-part of email addresses

Message ID 20231026-pretty-email-domain-v1-2-5d6bfa6615c0@gmail.com (mailing list archive)
State New, archived
Headers show
Series pretty: add %aA to show domain-part of email addresses | expand

Commit Message

Liam Beguin Oct. 26, 2023, 11:16 p.m. UTC
Many reports use the email domain to keep track of organizations
contributing to projects.
Add support for formatting the domain-part of a contributor's address so
that this can be done using git itself, with something like:

	git shortlog -sn --group=format:%aA v2.41.0..v2.42.0

Signed-off-by: Liam Beguin <liambeguin@gmail.com>
---
 Documentation/pretty-formats.txt |  6 ++++++
 pretty.c                         | 13 ++++++++++++-
 t/t4203-mailmap.sh               | 28 ++++++++++++++++++++++++++++
 t/t6006-rev-list-format.sh       |  6 ++++--
 4 files changed, 50 insertions(+), 3 deletions(-)

Comments

Kousik Sanagavarapu Oct. 27, 2023, 6:40 p.m. UTC | #1
Hi Liam,

Liam Beguin <liambeguin@gmail.com> wrote:
> Subject: Re: [PATCH 2/2] pretty: add '%aA' to show domain-part of email addresses

Since we are adding both '%aa' and '%aA', it would be better to
to include both in the commit subject, but since it is already long
enough, in my opinion

	pretty: add formats for domain-part of email address

would convey the gist of the commit to the reader better.

> Many reports use the email domain to keep track of organizations
> contributing to projects.
> Add support for formatting the domain-part of a contributor's address so
> that this can be done using git itself, with something like:
> 
> 	git shortlog -sn --group=format:%aA v2.41.0..v2.42.0
> 
> Signed-off-by: Liam Beguin <liambeguin@gmail.com>

A very very very minor nit but the commit message would read better as

	... contributing to projects, so add support for ...

Feel free to ignore it.

> ---
>  Documentation/pretty-formats.txt |  6 ++++++
>  pretty.c                         | 13 ++++++++++++-
>  t/t4203-mailmap.sh               | 28 ++++++++++++++++++++++++++++
>  t/t6006-rev-list-format.sh       |  6 ++++--
>  4 files changed, 50 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
> index a22f6fceecdd..72102a681c3a 100644
> --- a/Documentation/pretty-formats.txt
> +++ b/Documentation/pretty-formats.txt
> @@ -195,6 +195,9 @@ The placeholders are:
>  '%al':: author email local-part (the part before the '@' sign)
>  '%aL':: author email local-part (see '%al') respecting .mailmap, see
>  	linkgit:git-shortlog[1] or linkgit:git-blame[1])
> +'%aa':: author email domain-part (the part after the '@' sign)
> +'%aA':: author email domain-part (see '%al') respecting .mailmap, see
> +	linkgit:git-shortlog[1] or linkgit:git-blame[1])
>  '%ad':: author date (format respects --date= option)
>  '%aD':: author date, RFC2822 style
>  '%ar':: author date, relative
> @@ -213,6 +216,9 @@ The placeholders are:
>  '%cl':: committer email local-part (the part before the '@' sign)
>  '%cL':: committer email local-part (see '%cl') respecting .mailmap, see
>  	linkgit:git-shortlog[1] or linkgit:git-blame[1])
> +'%ca':: committer email domain-part (the part before the '@' sign)
> +'%cA':: committer email domain-part (see '%cl') respecting .mailmap, see
> +	linkgit:git-shortlog[1] or linkgit:git-blame[1])
>  '%cd':: committer date (format respects --date= option)
>  '%cD':: committer date, RFC2822 style
>  '%cr':: committer date, relative
> diff --git a/pretty.c b/pretty.c
> index cf964b060cd1..4f5d081589ea 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -791,7 +791,7 @@ static size_t format_person_part(struct strbuf *sb, char part,
>  	mail = s.mail_begin;
>  	maillen = s.mail_end - s.mail_begin;
>  
> -	if (part == 'N' || part == 'E' || part == 'L') /* mailmap lookup */
> +	if (part == 'N' || part == 'E' || part == 'L' || part == 'A') /* mailmap lookup */
>  		mailmap_name(&mail, &maillen, &name, &namelen);
>  	if (part == 'n' || part == 'N') {	/* name */
>  		strbuf_add(sb, name, namelen);
> @@ -808,6 +808,17 @@ static size_t format_person_part(struct strbuf *sb, char part,
>  		strbuf_add(sb, mail, maillen);
>  		return placeholder_len;
>  	}
> +	if (part == 'a' || part == 'A') {	/* domain-part */
> +		const char *at = memchr(mail, '@', maillen);
> +		if (at) {
> +			at += 1;
> +			maillen -= at - mail;
> +			strbuf_add(sb, at, maillen);
> +		} else {
> +			strbuf_add(sb, mail, maillen);
> +		}
> +		return placeholder_len;
> +	}
>  
>  	if (!s.date_begin)
>  		goto skip;

So, if we have a domain-name, we grab it, else (the case where we don't
have '@') we grab it as-is. Looks good.

> diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
> index 2016132f5161..35bf7bb05bea 100755
> --- a/t/t4203-mailmap.sh
> +++ b/t/t4203-mailmap.sh
> @@ -624,6 +624,34 @@ test_expect_success 'Log output (local-part email address)' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'Log output (domain-part email address)' '
> +	cat >expect <<-EOF &&
> +	Author email cto@coompany.xx has domain-part coompany.xx
> +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> +
> +	Author email me@company.xx has domain-part company.xx
> +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> +
> +	Author email me@company.xx has domain-part company.xx
> +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> +
> +	Author email nick2@company.xx has domain-part company.xx
> +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> +
> +	Author email bugs@company.xx has domain-part company.xx
> +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> +
> +	Author email bugs@company.xx has domain-part company.xx
> +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> +
> +	Author email author@example.com has domain-part example.com
> +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> +	EOF
> +
> +	git log --pretty=format:"Author email %ae has domain-part %aa%nCommitter email %ce has domain-part %ca%n" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'Log output with --use-mailmap' '
>  	test_config mailmap.file complex.map &&
>  
> diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
> index 573eb97a0f7f..34c686becf2d 100755
> --- a/t/t6006-rev-list-format.sh
> +++ b/t/t6006-rev-list-format.sh
> @@ -163,11 +163,12 @@ commit $head1
>  EOF
>  
>  # we don't test relative here
> -test_format author %an%n%ae%n%al%n%ad%n%aD%n%at <<EOF
> +test_format author %an%n%ae%n%al%aa%n%ad%n%aD%n%at <<EOF
>  commit $head2
>  $GIT_AUTHOR_NAME
>  $GIT_AUTHOR_EMAIL
>  $TEST_AUTHOR_LOCALNAME
> +$TEST_AUTHOR_DOMAIN
>  Thu Apr 7 15:13:13 2005 -0700
>  Thu, 7 Apr 2005 15:13:13 -0700
>  1112911993
> @@ -180,11 +181,12 @@ Thu, 7 Apr 2005 15:13:13 -0700
>  1112911993
>  EOF
>  
> -test_format committer %cn%n%ce%n%cl%n%cd%n%cD%n%ct <<EOF
> +test_format committer %cn%n%ce%n%cl%ca%n%cd%n%cD%n%ct <<EOF
>  commit $head2
>  $GIT_COMMITTER_NAME
>  $GIT_COMMITTER_EMAIL
>  $TEST_COMMITTER_LOCALNAME
> +$TEST_COMMITTER_DOMAIN
>  Thu Apr 7 15:13:13 2005 -0700
>  Thu, 7 Apr 2005 15:13:13 -0700
>  1112911993
> 
> -- 
> 2.39.0

The tests look good too.

I should say I'm skeptical of the new format's name though. I know '%ad' is
taken... but maybe it's just me.

Thanks
Junio C Hamano Oct. 28, 2023, 12:12 a.m. UTC | #2
Kousik Sanagavarapu <five231003@gmail.com> writes:

> Liam Beguin <liambeguin@gmail.com> wrote:
>> Subject: Re: [PATCH 2/2] pretty: add '%aA' to show domain-part of email addresses
>
> Since we are adding both '%aa' and '%aA', it would be better to
> to include both in the commit subject, but since it is already long
> enough, in my opinion
>
> 	pretty: add formats for domain-part of email address
>
> would convey the gist of the commit to the reader better.

;-).  Very good.

>> Many reports use the email domain to keep track of organizations
>> contributing to projects.

Grouping @gmail.com addresses do not smell all that useful, though.

More importantly, it is not clear what "Many reports" refers to.  If
they are *not* verbatim output from "git log" family of commands,
iow, they are produced by post-processing output from "git log"
family of commands, then I do not quite see why %aa is useful at
all.

Thanks.
Jeff King Oct. 28, 2023, 2:13 a.m. UTC | #3
On Sat, Oct 28, 2023 at 09:12:06AM +0900, Junio C Hamano wrote:

> Grouping @gmail.com addresses do not smell all that useful, though.
> 
> More importantly, it is not clear what "Many reports" refers to.  If
> they are *not* verbatim output from "git log" family of commands,
> iow, they are produced by post-processing output from "git log"
> family of commands, then I do not quite see why %aa is useful at
> all.

One way you could directly use this is in shortlog, which these days
lets you group by specific formats. So:

  git shortlog -ns --group=format:%aA

is potentially useful.

I say "potentially" because it really depends on your project and its
contributors. In git.git the results are mostly either too broad
("gmail.com" covers many unrelated people) or too narrow (I'll assume
I'm the only contributor from "peff.net"). There are a few possibly
useful ones ("microsoft.com", "gitlab.com", though even those are
misleading because email domains don't always correspond to
affiliations).

So I don't find it useful myself, but I see how it could be in the right
circumstances. It also feels like a symmetric match to "%al", which
already exists. I do find "aa" as the identifier a little hard to
remember. I guess it's "a" for "address", though I'd have called the
whole local@domain thing an address thing that. Of course "d" for domain
would make sense, but that is already taken. If we could spell it as
%(authoremail:domain) that would remove the question. But given the
existence of "%al", I'm not too sad to see another letter allocated to
this purpose in the meantime.

Just my two cents as a shortlog --format afficionado. ;) (Of course,
shortlog itself is the ultimate "you could really just post-process log
output" example).

-Peff
Liam Beguin Oct. 28, 2023, 2:20 a.m. UTC | #4
Hi Kousik,

On Sat, Oct 28, 2023 at 12:10:30AM +0530, Kousik Sanagavarapu wrote:
> Hi Liam,
> 
> Liam Beguin <liambeguin@gmail.com> wrote:
> > Subject: Re: [PATCH 2/2] pretty: add '%aA' to show domain-part of email addresses
> 
> Since we are adding both '%aa' and '%aA', it would be better to
> to include both in the commit subject, but since it is already long
> enough, in my opinion
> 
> 	pretty: add formats for domain-part of email address
> 
> would convey the gist of the commit to the reader better.

That reads better, I'll update the commit message.

> > Many reports use the email domain to keep track of organizations
> > contributing to projects.
> > Add support for formatting the domain-part of a contributor's address so
> > that this can be done using git itself, with something like:
> > 
> > 	git shortlog -sn --group=format:%aA v2.41.0..v2.42.0
> > 
> > Signed-off-by: Liam Beguin <liambeguin@gmail.com>
> 
> A very very very minor nit but the commit message would read better as
> 
> 	... contributing to projects, so add support for ...
> 
> Feel free to ignore it.
> 
> > ---
> >  Documentation/pretty-formats.txt |  6 ++++++
> >  pretty.c                         | 13 ++++++++++++-
> >  t/t4203-mailmap.sh               | 28 ++++++++++++++++++++++++++++
> >  t/t6006-rev-list-format.sh       |  6 ++++--
> >  4 files changed, 50 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
> > index a22f6fceecdd..72102a681c3a 100644
> > --- a/Documentation/pretty-formats.txt
> > +++ b/Documentation/pretty-formats.txt
> > @@ -195,6 +195,9 @@ The placeholders are:
> >  '%al':: author email local-part (the part before the '@' sign)
> >  '%aL':: author email local-part (see '%al') respecting .mailmap, see
> >  	linkgit:git-shortlog[1] or linkgit:git-blame[1])
> > +'%aa':: author email domain-part (the part after the '@' sign)
> > +'%aA':: author email domain-part (see '%al') respecting .mailmap, see
> > +	linkgit:git-shortlog[1] or linkgit:git-blame[1])
> >  '%ad':: author date (format respects --date= option)
> >  '%aD':: author date, RFC2822 style
> >  '%ar':: author date, relative
> > @@ -213,6 +216,9 @@ The placeholders are:
> >  '%cl':: committer email local-part (the part before the '@' sign)
> >  '%cL':: committer email local-part (see '%cl') respecting .mailmap, see
> >  	linkgit:git-shortlog[1] or linkgit:git-blame[1])
> > +'%ca':: committer email domain-part (the part before the '@' sign)
> > +'%cA':: committer email domain-part (see '%cl') respecting .mailmap, see
> > +	linkgit:git-shortlog[1] or linkgit:git-blame[1])
> >  '%cd':: committer date (format respects --date= option)
> >  '%cD':: committer date, RFC2822 style
> >  '%cr':: committer date, relative
> > diff --git a/pretty.c b/pretty.c
> > index cf964b060cd1..4f5d081589ea 100644
> > --- a/pretty.c
> > +++ b/pretty.c
> > @@ -791,7 +791,7 @@ static size_t format_person_part(struct strbuf *sb, char part,
> >  	mail = s.mail_begin;
> >  	maillen = s.mail_end - s.mail_begin;
> >  
> > -	if (part == 'N' || part == 'E' || part == 'L') /* mailmap lookup */
> > +	if (part == 'N' || part == 'E' || part == 'L' || part == 'A') /* mailmap lookup */
> >  		mailmap_name(&mail, &maillen, &name, &namelen);
> >  	if (part == 'n' || part == 'N') {	/* name */
> >  		strbuf_add(sb, name, namelen);
> > @@ -808,6 +808,17 @@ static size_t format_person_part(struct strbuf *sb, char part,
> >  		strbuf_add(sb, mail, maillen);
> >  		return placeholder_len;
> >  	}
> > +	if (part == 'a' || part == 'A') {	/* domain-part */
> > +		const char *at = memchr(mail, '@', maillen);
> > +		if (at) {
> > +			at += 1;
> > +			maillen -= at - mail;
> > +			strbuf_add(sb, at, maillen);
> > +		} else {
> > +			strbuf_add(sb, mail, maillen);
> > +		}
> > +		return placeholder_len;
> > +	}
> >  
> >  	if (!s.date_begin)
> >  		goto skip;
> 
> So, if we have a domain-name, we grab it, else (the case where we don't
> have '@') we grab it as-is. Looks good.
> 
> > diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
> > index 2016132f5161..35bf7bb05bea 100755
> > --- a/t/t4203-mailmap.sh
> > +++ b/t/t4203-mailmap.sh
> > @@ -624,6 +624,34 @@ test_expect_success 'Log output (local-part email address)' '
> >  	test_cmp expect actual
> >  '
> >  
> > +test_expect_success 'Log output (domain-part email address)' '
> > +	cat >expect <<-EOF &&
> > +	Author email cto@coompany.xx has domain-part coompany.xx
> > +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> > +
> > +	Author email me@company.xx has domain-part company.xx
> > +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> > +
> > +	Author email me@company.xx has domain-part company.xx
> > +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> > +
> > +	Author email nick2@company.xx has domain-part company.xx
> > +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> > +
> > +	Author email bugs@company.xx has domain-part company.xx
> > +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> > +
> > +	Author email bugs@company.xx has domain-part company.xx
> > +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> > +
> > +	Author email author@example.com has domain-part example.com
> > +	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
> > +	EOF
> > +
> > +	git log --pretty=format:"Author email %ae has domain-part %aa%nCommitter email %ce has domain-part %ca%n" >actual &&
> > +	test_cmp expect actual
> > +'
> > +
> >  test_expect_success 'Log output with --use-mailmap' '
> >  	test_config mailmap.file complex.map &&
> >  
> > diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
> > index 573eb97a0f7f..34c686becf2d 100755
> > --- a/t/t6006-rev-list-format.sh
> > +++ b/t/t6006-rev-list-format.sh
> > @@ -163,11 +163,12 @@ commit $head1
> >  EOF
> >  
> >  # we don't test relative here
> > -test_format author %an%n%ae%n%al%n%ad%n%aD%n%at <<EOF
> > +test_format author %an%n%ae%n%al%aa%n%ad%n%aD%n%at <<EOF
> >  commit $head2
> >  $GIT_AUTHOR_NAME
> >  $GIT_AUTHOR_EMAIL
> >  $TEST_AUTHOR_LOCALNAME
> > +$TEST_AUTHOR_DOMAIN
> >  Thu Apr 7 15:13:13 2005 -0700
> >  Thu, 7 Apr 2005 15:13:13 -0700
> >  1112911993
> > @@ -180,11 +181,12 @@ Thu, 7 Apr 2005 15:13:13 -0700
> >  1112911993
> >  EOF
> >  
> > -test_format committer %cn%n%ce%n%cl%n%cd%n%cD%n%ct <<EOF
> > +test_format committer %cn%n%ce%n%cl%ca%n%cd%n%cD%n%ct <<EOF
> >  commit $head2
> >  $GIT_COMMITTER_NAME
> >  $GIT_COMMITTER_EMAIL
> >  $TEST_COMMITTER_LOCALNAME
> > +$TEST_COMMITTER_DOMAIN
> >  Thu Apr 7 15:13:13 2005 -0700
> >  Thu, 7 Apr 2005 15:13:13 -0700
> >  1112911993
> > 
> > -- 
> > 2.39.0
> 
> The tests look good too.
> 
> I should say I'm skeptical of the new format's name though. I know '%ad' is
> taken... but maybe it's just me.
> 
> Thanks

I agree, %aa isn't the best, I'm definitly opened to suggestions.
My preference would've been for something like %ad, but that's already
taken.

Thanks for reviewing.

Cheers,
Liam
Liam Beguin Oct. 28, 2023, 3:22 a.m. UTC | #5
Hi Junio, Peff,

On Fri, Oct 27, 2023 at 10:13:01PM -0400, Jeff King wrote:
> On Sat, Oct 28, 2023 at 09:12:06AM +0900, Junio C Hamano wrote:
> 
> > Grouping @gmail.com addresses do not smell all that useful, though.

While I agree with you, I think that's more an exception that the rule.

> > More importantly, it is not clear what "Many reports" refers to.  If
> > they are *not* verbatim output from "git log" family of commands,
> > iow, they are produced by post-processing output from "git log"
> > family of commands, then I do not quite see why %aa is useful at
> > all.

I might've been a bit generous with "many report", I was mostly thinking
of the ones published by lwn.net, and U-Boot for example.

To some extent, "git shortlog" could be considered a part of that
post-processing chain.

> One way you could directly use this is in shortlog, which these days
> lets you group by specific formats. So:
> 
>   git shortlog -ns --group=format:%aA

That's exactly what I implemented this for :-)

> is potentially useful.
> 
> I say "potentially" because it really depends on your project and its
> contributors. In git.git the results are mostly either too broad
> ("gmail.com" covers many unrelated people) or too narrow (I'll assume
> I'm the only contributor from "peff.net"). There are a few possibly
> useful ones ("microsoft.com", "gitlab.com", though even those are
> misleading because email domains don't always correspond to
> affiliations).

I agree with your comment here, while grouping everything under
"gmail.com" for example doesn't provide anything really useful we can
rely on mailmap to fix that when appropriate. I think it would otherwise
count as unaffiliated.

I don't claim this to be foolproof, but I do think that it gives a good
overall view of which companies are involved in the project for the most
part.

> So I don't find it useful myself, but I see how it could be in the right
> circumstances. It also feels like a symmetric match to "%al", which
> already exists. I do find "aa" as the identifier a little hard to
> remember. I guess it's "a" for "address", though I'd have called the
> whole local@domain thing an address thing that. Of course "d" for domain
> would make sense, but that is already taken. If we could spell it as
> %(authoremail:domain) that would remove the question. But given the
> existence of "%al", I'm not too sad to see another letter allocated to
> this purpose in the meantime.

I chose the "a" for "address", but I'm not sold on %aa either.
I just couldn't find anything better that wasn't already taken.

What about "a@"?

It's a bit easier to remember, being the first character of the
domain-part.

> Just my two cents as a shortlog --format afficionado. ;) (Of course,
> shortlog itself is the ultimate "you could really just post-process log
> output" example).

I'm a big fan of shortlog --format (and --group) as well!

Taking it a step further, it's also possible to pass in whatever mailmap
you want to generate a "report".  Let's say there's mapping that only
makes sense for a single release something like this could be used:

git -c mailmap.file=git-mailmap-v2.42 shortlog -sn --group=format:%aA

> -Peff

Thanks for your time.

Cheers,
Liam
Andy Koppe Oct. 28, 2023, 6:58 a.m. UTC | #6
On 28/10/2023 04:22, Liam Beguin wrote:
> On Fri, Oct 27, 2023 at 10:13:01PM -0400, Jeff King wrote:
>> One way you could directly use this is in shortlog, which these days
>> lets you group by specific formats. So:
>>
>>    git shortlog -ns --group=format:%aA
> 
> That's exactly what I implemented this for :-)

Another potential use case is custom log formats where one might want to 
color the local-part separately from the domain part.

>> It also feels like a symmetric match to "%al", which already exists.

Speaking of symmetry, I think it would need "%c" counterparts for the 
coloring use case.

> I chose the "a" for "address", but I'm not sold on %aa either.
> I just couldn't find anything better that wasn't already taken.
> 
> What about "a@"?

Makes sense, and I suppose there's "%G?" as precedent for using a symbol 
rather than letter in these.

If that's not suitable though, how about "m" for "mail domain"? It also 
immediately follows "l" for "local-part" in the alphabet.

Regards,
Andy
Andy Koppe Oct. 28, 2023, 7:02 a.m. UTC | #7
On 28/10/2023 07:58, Andy Koppe wrote:
> On 28/10/2023 04:22, Liam Beguin wrote:
>> On Fri, Oct 27, 2023 at 10:13:01PM -0400, Jeff King wrote:
>>> One way you could directly use this is in shortlog, which these days
>>> lets you group by specific formats. So:
>>>
>>>    git shortlog -ns --group=format:%aA
>>
>> That's exactly what I implemented this for :-)
> 
> Another potential use case is custom log formats where one might want to 
> color the local-part separately from the domain part.
> 
>>> It also feels like a symmetric match to "%al", which already exists.
> 
> Speaking of symmetry, I think it would need "%c" counterparts for the 
> coloring use case.

D'oh, it always helps to read the actual patches, which do have the %c 
variants.

Thanks,
Andy
Oswald Buddenhagen Oct. 28, 2023, 3:27 p.m. UTC | #8
On Fri, Oct 27, 2023 at 10:20:48PM -0400, Liam Beguin wrote:
>I agree, %aa isn't the best, I'm definitly opened to suggestions.
>My preference would've been for something like %ad, but that's already
>taken.
>
H for host would be available. (not to be confused with h for human.)

in retrospect i'd say that it was unwise to use separate letters for the 
various forms of dates - a set of qualifiers that would be applied to a 
single specifier would be nicer. ah well ...

regards
Andy Koppe Oct. 28, 2023, 9:11 p.m. UTC | #9
On 28/10/2023 16:27, Oswald Buddenhagen wrote:
> On Fri, Oct 27, 2023 at 10:20:48PM -0400, Liam Beguin wrote:
>> I agree, %aa isn't the best, I'm definitly opened to suggestions.
>> My preference would've been for something like %ad, but that's already
>> taken.
>>
> H for host would be available. (not to be confused with h for human.)

Both lowercase and uppercase would be needed though, to mirror %ae/%aE 
and %al/%aL for choosing whether to respect .mailmap. That actually 
rules out the %a@ idea as well.

Andy
Junio C Hamano Oct. 29, 2023, 11:53 p.m. UTC | #10
Jeff King <peff@peff.net> writes:

> On Sat, Oct 28, 2023 at 09:12:06AM +0900, Junio C Hamano wrote:
>
>> Grouping @gmail.com addresses do not smell all that useful, though.
>> ... 
> One way you could directly use this is in shortlog, which these days
> lets you group by specific formats. So:
>
>   git shortlog -ns --group=format:%aA
>
> is potentially useful.

Exactly.  That is what I meant by "Grouping", and I agree with you
about "potentially" part, too ;-)  Throwing all @gmail.com addresses
into a single bin would not be very useful.

> ... If we could spell it as
> %(authoremail:domain) that would remove the question. But given the
> existence of "%al", I'm not too sad to see another letter allocated to
> this purpose in the meantime.

Another line of thought is perhaps it is potentially useful to teach
the --format= machinery to be a bit more programmable, e.g. allowing
to compute a substring of an existing field %{%aE#*@} without having
to waste a letter each for the local part and domain part.  But as I
already said, we are now talking about "postprocessing", and adding
complexity to our codebase only to have incomplete flexibility may
not be worth it.  A more specific %(authoremail:localpart) and its
domain counterpart may be easier to explain and understand.

In any case, it is a bit too late to say "let's not waste the
precious single letter namespace to add useless features", as we
have come way too far, so I do not mind too much using a currently
unused letter $X for yet another author and committer trait.
Jeff King Oct. 30, 2023, 9:10 a.m. UTC | #11
On Sat, Oct 28, 2023 at 07:58:31AM +0100, Andy Koppe wrote:

> > I chose the "a" for "address", but I'm not sold on %aa either.
> > I just couldn't find anything better that wasn't already taken.
> > 
> > What about "a@"?
> 
> Makes sense, and I suppose there's "%G?" as precedent for using a symbol
> rather than letter in these.

This is pretty subjective, but I somehow find "%a@" hard to parse
visually (despite the fact that yes, "%G?" already crossed that bridge).
But I think the real nail in the coffin is your later comment that we
cannot use capitalization to make the raw/mailmap distinction.

> If that's not suitable though, how about "m" for "mail domain"? It also
> immediately follows "l" for "local-part" in the alphabet.

FWIW, that makes sense to me over "a" (though admittedly it is not
really any less vague than "a", so it really might vary from person to
person).

-Peff
Liam Beguin Nov. 1, 2023, 7:06 p.m. UTC | #12
On Mon, Oct 30, 2023 at 05:10:11AM -0400, Jeff King wrote:
> On Sat, Oct 28, 2023 at 07:58:31AM +0100, Andy Koppe wrote:
> 
> > > I chose the "a" for "address", but I'm not sold on %aa either.
> > > I just couldn't find anything better that wasn't already taken.
> > > 
> > > What about "a@"?
> > 
> > Makes sense, and I suppose there's "%G?" as precedent for using a symbol
> > rather than letter in these.
> 
> This is pretty subjective, but I somehow find "%a@" hard to parse
> visually (despite the fact that yes, "%G?" already crossed that bridge).
> But I think the real nail in the coffin is your later comment that we
> cannot use capitalization to make the raw/mailmap distinction.
> 
> > If that's not suitable though, how about "m" for "mail domain"? It also
> > immediately follows "l" for "local-part" in the alphabet.
> 
> FWIW, that makes sense to me over "a" (though admittedly it is not
> really any less vague than "a", so it really might vary from person to
> person).

Okay, I like 'm' better as well. And '@' is a no go because the
mailmaped version. I'll resend.

Cheers,
Liam
Andy Koppe Nov. 3, 2023, 8:22 a.m. UTC | #13
On 27/10/2023 19:40, Kousik Sanagavarapu wrote:
> Liam Beguin <liambeguin@gmail.com> wrote:
>> @@ -808,6 +808,17 @@ static size_t format_person_part(struct strbuf *sb, char part,
>>   		strbuf_add(sb, mail, maillen);
>>   		return placeholder_len;
>>   	}
>> +	if (part == 'a' || part == 'A') {	/* domain-part */
>> +		const char *at = memchr(mail, '@', maillen);
>> +		if (at) {
>> +			at += 1;
>> +			maillen -= at - mail;
>> +			strbuf_add(sb, at, maillen);
>> +		} else {
>> +			strbuf_add(sb, mail, maillen);
>> +		}
>> +		return placeholder_len;
>> +	}
>>   
>>   	if (!s.date_begin)
>>   		goto skip;
> 
> So, if we have a domain-name, we grab it, else (the case where we don't
> have '@') we grab it as-is. Looks good.

I'm not sure that this is the right way to handle a missing '@' here 
actually, because %al already returns the whole email field in that 
case, which makes sense as the likes of the 'mail' command would 
interpret it as a local username.

And if someone was going to use %al and the new specifier together to 
format the parts of the email field differently, they probably wouldn't 
want the field to appear twice.

Therefore I think it would be more appropriate to expand to nothing in 
that case. Tools that consume this output would already need to be able 
to deal with the empty case, as it could also happen if there's a single 
'@' at the end of the email field, or if the field is empty.

Regards,
Andy
Kousik Sanagavarapu Nov. 3, 2023, 5:20 p.m. UTC | #14
On Fri, Nov 03, 2023 at 08:22:05AM +0000, Andy Koppe wrote:
> 
> On 27/10/2023 19:40, Kousik Sanagavarapu wrote:
> > So, if we have a domain-name, we grab it, else (the case where we don't
> > have '@') we grab it as-is. Looks good.
> 
> I'm not sure that this is the right way to handle a missing '@' here
> actually, because %al already returns the whole email field in that case,
> which makes sense as the likes of the 'mail' command would interpret it as a
> local username.
>
> And if someone was going to use %al and the new specifier together to format
> the parts of the email field differently, they probably wouldn't want the
> field to appear twice.
> 
> Therefore I think it would be more appropriate to expand to nothing in that
> case. Tools that consume this output would already need to be able to deal
> with the empty case, as it could also happen if there's a single '@' at the
> end of the email field, or if the field is empty.

I originally thought since localpart and the new domainpart are like
counterparts (or are symmetrical, like Peff mentioned), falling back
like in the case of localpart was the correct way here (again, symmetry).

Having read your reasoning though I think it makes sense to not fall back the
same way as localpart (%al) and return empty instead.

This goes in favor of what Liam said in the original commit message of this
feature being used to keep track of commits from different organizations (as
in from GitHub or GitLab) or at least domains that make sense. As, if we
return the whole email (where the email is with no domain-part or '@' is at
the end of the email), the whole purpose of our new feature is lost.

Thanks
Junio C Hamano Nov. 4, 2023, 1:54 a.m. UTC | #15
Andy Koppe <andy.koppe@gmail.com> writes:

> I'm not sure that this is the right way to handle a missing '@' here
> actually, because %al already returns the whole email field in that
> case, which makes sense as the likes of the 'mail' command would
> interpret it as a local username.

We could expand "%am" to \C-h (\010) so that "%al@%am" would end up
displaying the same as "%al" but that would be way too cute for its
own worth ;-)

It is unfortunate that "%al@%am" cannot be the same as "%ae" for
local-only address, but giving an empty string for "%am" if "%ae" is
local-only would be the best we could do for our users, and certainly
much better than giving the same as "%ae", as you said above.

Thanks.
Andy Koppe Nov. 4, 2023, 9:51 a.m. UTC | #16
On 04/11/2023 01:54, Junio C Hamano wrote:
> Andy Koppe <andy.koppe@gmail.com> writes:
> 
>> I'm not sure that this is the right way to handle a missing '@' here
>> actually, because %al already returns the whole email field in that
>> case, which makes sense as the likes of the 'mail' command would
>> interpret it as a local username.
> 
> We could expand "%am" to \C-h (\010) so that "%al@%am" would end up
> displaying the same as "%al" but that would be way too cute for its
> own worth ;-)

:)

Unfortunately it also wouldn't always work, because ^H only moves the 
cursor, so if the next thing is a newline, the '@' wouldn't actually get 
deleted.

> It is unfortunate that "%al@%am" cannot be the same as "%ae" for
> local-only address, but giving an empty string for "%am" if "%ae" is
> local-only would be the best we could do for our users, and certainly
> much better than giving the same as "%ae", as you said above.

I suppose "%@am" could mean prepending an '@' when a domain is present, 
similar to how "% am" would mean prepending a space and "%+am" would 
mean prepending a newline. With that, "%al%@am" would be equivalent to 
"%ae".

But that then raises the question whether it should be implemented just 
for "%@[ac][mM]", or for all placeholders. In any case, I don't think it 
needs to be part of the changes at hand.

Andy
Junio C Hamano Nov. 20, 2023, 8:21 p.m. UTC | #17
Junio C Hamano <gitster@pobox.com> writes:

> Another line of thought is perhaps it is potentially useful to teach
> the --format= machinery to be a bit more programmable, e.g. allowing
> to compute a substring of an existing field %{%aE#*@} without having
> to waste a letter each for the local part and domain part.  But as I
> already said, we are now talking about "postprocessing", and adding
> complexity to our codebase only to have incomplete flexibility may
> not be worth it.  A more specific %(authoremail:localpart) and its
> domain counterpart may be easier to explain and understand.
>
> In any case, it is a bit too late to say "let's not waste the
> precious single letter namespace to add useless features", as we
> have come way too far, so I do not mind too much using a currently
> unused letter $X for yet another author and committer trait.

When I wrote the above, I somehow forgot the existing work in the
ref-filter (aka "for-each-ref") placeholders, where we have support
to a lot more flexible way to customize these things.

For example, "%(authoremail:mailmap,localpart)" can be used to say,
instead of wasting two letters 'l' and 'L' out of precious 52, that
we want e-mail address honoring the mailmap, and take only the local
part.  And the support for the host part of the address that this
topic discussed should be implementable fairly easily (just adding
EO_HOSTPART bit to the email_option structure would be sufficient)
on the ref-filter side.

We saw efforts from time to time to give "log --pretty=format:" more
of the good things from the "for-each-ref --format=" placeholders
(and vice versa), and it may give us a good way forward.
Liam Beguin Dec. 10, 2023, 9:07 p.m. UTC | #18
Hi Junio,

Apologies for the late reply.

On Tue, Nov 21, 2023 at 05:21:43AM +0900, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Another line of thought is perhaps it is potentially useful to teach
> > the --format= machinery to be a bit more programmable, e.g. allowing
> > to compute a substring of an existing field %{%aE#*@} without having
> > to waste a letter each for the local part and domain part.  But as I
> > already said, we are now talking about "postprocessing", and adding
> > complexity to our codebase only to have incomplete flexibility may
> > not be worth it.  A more specific %(authoremail:localpart) and its
> > domain counterpart may be easier to explain and understand.
> >
> > In any case, it is a bit too late to say "let's not waste the
> > precious single letter namespace to add useless features", as we
> > have come way too far, so I do not mind too much using a currently
> > unused letter $X for yet another author and committer trait.
> 
> When I wrote the above, I somehow forgot the existing work in the
> ref-filter (aka "for-each-ref") placeholders, where we have support
> to a lot more flexible way to customize these things.

I looked into this a little, after your first email. I'll try to make
time to have another look.

> For example, "%(authoremail:mailmap,localpart)" can be used to say,
> instead of wasting two letters 'l' and 'L' out of precious 52, that
> we want e-mail address honoring the mailmap, and take only the local
> part.  And the support for the host part of the address that this
> topic discussed should be implementable fairly easily (just adding
> EO_HOSTPART bit to the email_option structure would be sufficient)
> on the ref-filter side.
> 
> We saw efforts from time to time to give "log --pretty=format:" more
> of the good things from the "for-each-ref --format=" placeholders
> (and vice versa), and it may give us a good way forward.

This definitely sounds like a better approach than wasting two more
letters.

Liam
diff mbox series

Patch

diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index a22f6fceecdd..72102a681c3a 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -195,6 +195,9 @@  The placeholders are:
 '%al':: author email local-part (the part before the '@' sign)
 '%aL':: author email local-part (see '%al') respecting .mailmap, see
 	linkgit:git-shortlog[1] or linkgit:git-blame[1])
+'%aa':: author email domain-part (the part after the '@' sign)
+'%aA':: author email domain-part (see '%al') respecting .mailmap, see
+	linkgit:git-shortlog[1] or linkgit:git-blame[1])
 '%ad':: author date (format respects --date= option)
 '%aD':: author date, RFC2822 style
 '%ar':: author date, relative
@@ -213,6 +216,9 @@  The placeholders are:
 '%cl':: committer email local-part (the part before the '@' sign)
 '%cL':: committer email local-part (see '%cl') respecting .mailmap, see
 	linkgit:git-shortlog[1] or linkgit:git-blame[1])
+'%ca':: committer email domain-part (the part before the '@' sign)
+'%cA':: committer email domain-part (see '%cl') respecting .mailmap, see
+	linkgit:git-shortlog[1] or linkgit:git-blame[1])
 '%cd':: committer date (format respects --date= option)
 '%cD':: committer date, RFC2822 style
 '%cr':: committer date, relative
diff --git a/pretty.c b/pretty.c
index cf964b060cd1..4f5d081589ea 100644
--- a/pretty.c
+++ b/pretty.c
@@ -791,7 +791,7 @@  static size_t format_person_part(struct strbuf *sb, char part,
 	mail = s.mail_begin;
 	maillen = s.mail_end - s.mail_begin;
 
-	if (part == 'N' || part == 'E' || part == 'L') /* mailmap lookup */
+	if (part == 'N' || part == 'E' || part == 'L' || part == 'A') /* mailmap lookup */
 		mailmap_name(&mail, &maillen, &name, &namelen);
 	if (part == 'n' || part == 'N') {	/* name */
 		strbuf_add(sb, name, namelen);
@@ -808,6 +808,17 @@  static size_t format_person_part(struct strbuf *sb, char part,
 		strbuf_add(sb, mail, maillen);
 		return placeholder_len;
 	}
+	if (part == 'a' || part == 'A') {	/* domain-part */
+		const char *at = memchr(mail, '@', maillen);
+		if (at) {
+			at += 1;
+			maillen -= at - mail;
+			strbuf_add(sb, at, maillen);
+		} else {
+			strbuf_add(sb, mail, maillen);
+		}
+		return placeholder_len;
+	}
 
 	if (!s.date_begin)
 		goto skip;
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 2016132f5161..35bf7bb05bea 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -624,6 +624,34 @@  test_expect_success 'Log output (local-part email address)' '
 	test_cmp expect actual
 '
 
+test_expect_success 'Log output (domain-part email address)' '
+	cat >expect <<-EOF &&
+	Author email cto@coompany.xx has domain-part coompany.xx
+	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
+
+	Author email me@company.xx has domain-part company.xx
+	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
+
+	Author email me@company.xx has domain-part company.xx
+	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
+
+	Author email nick2@company.xx has domain-part company.xx
+	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
+
+	Author email bugs@company.xx has domain-part company.xx
+	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
+
+	Author email bugs@company.xx has domain-part company.xx
+	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
+
+	Author email author@example.com has domain-part example.com
+	Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN
+	EOF
+
+	git log --pretty=format:"Author email %ae has domain-part %aa%nCommitter email %ce has domain-part %ca%n" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'Log output with --use-mailmap' '
 	test_config mailmap.file complex.map &&
 
diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index 573eb97a0f7f..34c686becf2d 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -163,11 +163,12 @@  commit $head1
 EOF
 
 # we don't test relative here
-test_format author %an%n%ae%n%al%n%ad%n%aD%n%at <<EOF
+test_format author %an%n%ae%n%al%aa%n%ad%n%aD%n%at <<EOF
 commit $head2
 $GIT_AUTHOR_NAME
 $GIT_AUTHOR_EMAIL
 $TEST_AUTHOR_LOCALNAME
+$TEST_AUTHOR_DOMAIN
 Thu Apr 7 15:13:13 2005 -0700
 Thu, 7 Apr 2005 15:13:13 -0700
 1112911993
@@ -180,11 +181,12 @@  Thu, 7 Apr 2005 15:13:13 -0700
 1112911993
 EOF
 
-test_format committer %cn%n%ce%n%cl%n%cd%n%cD%n%ct <<EOF
+test_format committer %cn%n%ce%n%cl%ca%n%cd%n%cD%n%ct <<EOF
 commit $head2
 $GIT_COMMITTER_NAME
 $GIT_COMMITTER_EMAIL
 $TEST_COMMITTER_LOCALNAME
+$TEST_COMMITTER_DOMAIN
 Thu Apr 7 15:13:13 2005 -0700
 Thu, 7 Apr 2005 15:13:13 -0700
 1112911993