Message ID | 20231026-pretty-email-domain-v1-2-5d6bfa6615c0@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | pretty: add %aA to show domain-part of email addresses | expand |
Hi Liam, Liam Beguin <liambeguin@gmail.com> wrote: > Subject: Re: [PATCH 2/2] pretty: add '%aA' to show domain-part of email addresses Since we are adding both '%aa' and '%aA', it would be better to to include both in the commit subject, but since it is already long enough, in my opinion pretty: add formats for domain-part of email address would convey the gist of the commit to the reader better. > Many reports use the email domain to keep track of organizations > contributing to projects. > Add support for formatting the domain-part of a contributor's address so > that this can be done using git itself, with something like: > > git shortlog -sn --group=format:%aA v2.41.0..v2.42.0 > > Signed-off-by: Liam Beguin <liambeguin@gmail.com> A very very very minor nit but the commit message would read better as ... contributing to projects, so add support for ... Feel free to ignore it. > --- > Documentation/pretty-formats.txt | 6 ++++++ > pretty.c | 13 ++++++++++++- > t/t4203-mailmap.sh | 28 ++++++++++++++++++++++++++++ > t/t6006-rev-list-format.sh | 6 ++++-- > 4 files changed, 50 insertions(+), 3 deletions(-) > > diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt > index a22f6fceecdd..72102a681c3a 100644 > --- a/Documentation/pretty-formats.txt > +++ b/Documentation/pretty-formats.txt > @@ -195,6 +195,9 @@ The placeholders are: > '%al':: author email local-part (the part before the '@' sign) > '%aL':: author email local-part (see '%al') respecting .mailmap, see > linkgit:git-shortlog[1] or linkgit:git-blame[1]) > +'%aa':: author email domain-part (the part after the '@' sign) > +'%aA':: author email domain-part (see '%al') respecting .mailmap, see > + linkgit:git-shortlog[1] or linkgit:git-blame[1]) > '%ad':: author date (format respects --date= option) > '%aD':: author date, RFC2822 style > '%ar':: author date, relative > @@ -213,6 +216,9 @@ The placeholders are: > '%cl':: committer email local-part (the part before the '@' sign) > '%cL':: committer email local-part (see '%cl') respecting .mailmap, see > linkgit:git-shortlog[1] or linkgit:git-blame[1]) > +'%ca':: committer email domain-part (the part before the '@' sign) > +'%cA':: committer email domain-part (see '%cl') respecting .mailmap, see > + linkgit:git-shortlog[1] or linkgit:git-blame[1]) > '%cd':: committer date (format respects --date= option) > '%cD':: committer date, RFC2822 style > '%cr':: committer date, relative > diff --git a/pretty.c b/pretty.c > index cf964b060cd1..4f5d081589ea 100644 > --- a/pretty.c > +++ b/pretty.c > @@ -791,7 +791,7 @@ static size_t format_person_part(struct strbuf *sb, char part, > mail = s.mail_begin; > maillen = s.mail_end - s.mail_begin; > > - if (part == 'N' || part == 'E' || part == 'L') /* mailmap lookup */ > + if (part == 'N' || part == 'E' || part == 'L' || part == 'A') /* mailmap lookup */ > mailmap_name(&mail, &maillen, &name, &namelen); > if (part == 'n' || part == 'N') { /* name */ > strbuf_add(sb, name, namelen); > @@ -808,6 +808,17 @@ static size_t format_person_part(struct strbuf *sb, char part, > strbuf_add(sb, mail, maillen); > return placeholder_len; > } > + if (part == 'a' || part == 'A') { /* domain-part */ > + const char *at = memchr(mail, '@', maillen); > + if (at) { > + at += 1; > + maillen -= at - mail; > + strbuf_add(sb, at, maillen); > + } else { > + strbuf_add(sb, mail, maillen); > + } > + return placeholder_len; > + } > > if (!s.date_begin) > goto skip; So, if we have a domain-name, we grab it, else (the case where we don't have '@') we grab it as-is. Looks good. > diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh > index 2016132f5161..35bf7bb05bea 100755 > --- a/t/t4203-mailmap.sh > +++ b/t/t4203-mailmap.sh > @@ -624,6 +624,34 @@ test_expect_success 'Log output (local-part email address)' ' > test_cmp expect actual > ' > > +test_expect_success 'Log output (domain-part email address)' ' > + cat >expect <<-EOF && > + Author email cto@coompany.xx has domain-part coompany.xx > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > + > + Author email me@company.xx has domain-part company.xx > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > + > + Author email me@company.xx has domain-part company.xx > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > + > + Author email nick2@company.xx has domain-part company.xx > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > + > + Author email bugs@company.xx has domain-part company.xx > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > + > + Author email bugs@company.xx has domain-part company.xx > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > + > + Author email author@example.com has domain-part example.com > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > + EOF > + > + git log --pretty=format:"Author email %ae has domain-part %aa%nCommitter email %ce has domain-part %ca%n" >actual && > + test_cmp expect actual > +' > + > test_expect_success 'Log output with --use-mailmap' ' > test_config mailmap.file complex.map && > > diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh > index 573eb97a0f7f..34c686becf2d 100755 > --- a/t/t6006-rev-list-format.sh > +++ b/t/t6006-rev-list-format.sh > @@ -163,11 +163,12 @@ commit $head1 > EOF > > # we don't test relative here > -test_format author %an%n%ae%n%al%n%ad%n%aD%n%at <<EOF > +test_format author %an%n%ae%n%al%aa%n%ad%n%aD%n%at <<EOF > commit $head2 > $GIT_AUTHOR_NAME > $GIT_AUTHOR_EMAIL > $TEST_AUTHOR_LOCALNAME > +$TEST_AUTHOR_DOMAIN > Thu Apr 7 15:13:13 2005 -0700 > Thu, 7 Apr 2005 15:13:13 -0700 > 1112911993 > @@ -180,11 +181,12 @@ Thu, 7 Apr 2005 15:13:13 -0700 > 1112911993 > EOF > > -test_format committer %cn%n%ce%n%cl%n%cd%n%cD%n%ct <<EOF > +test_format committer %cn%n%ce%n%cl%ca%n%cd%n%cD%n%ct <<EOF > commit $head2 > $GIT_COMMITTER_NAME > $GIT_COMMITTER_EMAIL > $TEST_COMMITTER_LOCALNAME > +$TEST_COMMITTER_DOMAIN > Thu Apr 7 15:13:13 2005 -0700 > Thu, 7 Apr 2005 15:13:13 -0700 > 1112911993 > > -- > 2.39.0 The tests look good too. I should say I'm skeptical of the new format's name though. I know '%ad' is taken... but maybe it's just me. Thanks
Kousik Sanagavarapu <five231003@gmail.com> writes: > Liam Beguin <liambeguin@gmail.com> wrote: >> Subject: Re: [PATCH 2/2] pretty: add '%aA' to show domain-part of email addresses > > Since we are adding both '%aa' and '%aA', it would be better to > to include both in the commit subject, but since it is already long > enough, in my opinion > > pretty: add formats for domain-part of email address > > would convey the gist of the commit to the reader better. ;-). Very good. >> Many reports use the email domain to keep track of organizations >> contributing to projects. Grouping @gmail.com addresses do not smell all that useful, though. More importantly, it is not clear what "Many reports" refers to. If they are *not* verbatim output from "git log" family of commands, iow, they are produced by post-processing output from "git log" family of commands, then I do not quite see why %aa is useful at all. Thanks.
On Sat, Oct 28, 2023 at 09:12:06AM +0900, Junio C Hamano wrote: > Grouping @gmail.com addresses do not smell all that useful, though. > > More importantly, it is not clear what "Many reports" refers to. If > they are *not* verbatim output from "git log" family of commands, > iow, they are produced by post-processing output from "git log" > family of commands, then I do not quite see why %aa is useful at > all. One way you could directly use this is in shortlog, which these days lets you group by specific formats. So: git shortlog -ns --group=format:%aA is potentially useful. I say "potentially" because it really depends on your project and its contributors. In git.git the results are mostly either too broad ("gmail.com" covers many unrelated people) or too narrow (I'll assume I'm the only contributor from "peff.net"). There are a few possibly useful ones ("microsoft.com", "gitlab.com", though even those are misleading because email domains don't always correspond to affiliations). So I don't find it useful myself, but I see how it could be in the right circumstances. It also feels like a symmetric match to "%al", which already exists. I do find "aa" as the identifier a little hard to remember. I guess it's "a" for "address", though I'd have called the whole local@domain thing an address thing that. Of course "d" for domain would make sense, but that is already taken. If we could spell it as %(authoremail:domain) that would remove the question. But given the existence of "%al", I'm not too sad to see another letter allocated to this purpose in the meantime. Just my two cents as a shortlog --format afficionado. ;) (Of course, shortlog itself is the ultimate "you could really just post-process log output" example). -Peff
Hi Kousik, On Sat, Oct 28, 2023 at 12:10:30AM +0530, Kousik Sanagavarapu wrote: > Hi Liam, > > Liam Beguin <liambeguin@gmail.com> wrote: > > Subject: Re: [PATCH 2/2] pretty: add '%aA' to show domain-part of email addresses > > Since we are adding both '%aa' and '%aA', it would be better to > to include both in the commit subject, but since it is already long > enough, in my opinion > > pretty: add formats for domain-part of email address > > would convey the gist of the commit to the reader better. That reads better, I'll update the commit message. > > Many reports use the email domain to keep track of organizations > > contributing to projects. > > Add support for formatting the domain-part of a contributor's address so > > that this can be done using git itself, with something like: > > > > git shortlog -sn --group=format:%aA v2.41.0..v2.42.0 > > > > Signed-off-by: Liam Beguin <liambeguin@gmail.com> > > A very very very minor nit but the commit message would read better as > > ... contributing to projects, so add support for ... > > Feel free to ignore it. > > > --- > > Documentation/pretty-formats.txt | 6 ++++++ > > pretty.c | 13 ++++++++++++- > > t/t4203-mailmap.sh | 28 ++++++++++++++++++++++++++++ > > t/t6006-rev-list-format.sh | 6 ++++-- > > 4 files changed, 50 insertions(+), 3 deletions(-) > > > > diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt > > index a22f6fceecdd..72102a681c3a 100644 > > --- a/Documentation/pretty-formats.txt > > +++ b/Documentation/pretty-formats.txt > > @@ -195,6 +195,9 @@ The placeholders are: > > '%al':: author email local-part (the part before the '@' sign) > > '%aL':: author email local-part (see '%al') respecting .mailmap, see > > linkgit:git-shortlog[1] or linkgit:git-blame[1]) > > +'%aa':: author email domain-part (the part after the '@' sign) > > +'%aA':: author email domain-part (see '%al') respecting .mailmap, see > > + linkgit:git-shortlog[1] or linkgit:git-blame[1]) > > '%ad':: author date (format respects --date= option) > > '%aD':: author date, RFC2822 style > > '%ar':: author date, relative > > @@ -213,6 +216,9 @@ The placeholders are: > > '%cl':: committer email local-part (the part before the '@' sign) > > '%cL':: committer email local-part (see '%cl') respecting .mailmap, see > > linkgit:git-shortlog[1] or linkgit:git-blame[1]) > > +'%ca':: committer email domain-part (the part before the '@' sign) > > +'%cA':: committer email domain-part (see '%cl') respecting .mailmap, see > > + linkgit:git-shortlog[1] or linkgit:git-blame[1]) > > '%cd':: committer date (format respects --date= option) > > '%cD':: committer date, RFC2822 style > > '%cr':: committer date, relative > > diff --git a/pretty.c b/pretty.c > > index cf964b060cd1..4f5d081589ea 100644 > > --- a/pretty.c > > +++ b/pretty.c > > @@ -791,7 +791,7 @@ static size_t format_person_part(struct strbuf *sb, char part, > > mail = s.mail_begin; > > maillen = s.mail_end - s.mail_begin; > > > > - if (part == 'N' || part == 'E' || part == 'L') /* mailmap lookup */ > > + if (part == 'N' || part == 'E' || part == 'L' || part == 'A') /* mailmap lookup */ > > mailmap_name(&mail, &maillen, &name, &namelen); > > if (part == 'n' || part == 'N') { /* name */ > > strbuf_add(sb, name, namelen); > > @@ -808,6 +808,17 @@ static size_t format_person_part(struct strbuf *sb, char part, > > strbuf_add(sb, mail, maillen); > > return placeholder_len; > > } > > + if (part == 'a' || part == 'A') { /* domain-part */ > > + const char *at = memchr(mail, '@', maillen); > > + if (at) { > > + at += 1; > > + maillen -= at - mail; > > + strbuf_add(sb, at, maillen); > > + } else { > > + strbuf_add(sb, mail, maillen); > > + } > > + return placeholder_len; > > + } > > > > if (!s.date_begin) > > goto skip; > > So, if we have a domain-name, we grab it, else (the case where we don't > have '@') we grab it as-is. Looks good. > > > diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh > > index 2016132f5161..35bf7bb05bea 100755 > > --- a/t/t4203-mailmap.sh > > +++ b/t/t4203-mailmap.sh > > @@ -624,6 +624,34 @@ test_expect_success 'Log output (local-part email address)' ' > > test_cmp expect actual > > ' > > > > +test_expect_success 'Log output (domain-part email address)' ' > > + cat >expect <<-EOF && > > + Author email cto@coompany.xx has domain-part coompany.xx > > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > > + > > + Author email me@company.xx has domain-part company.xx > > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > > + > > + Author email me@company.xx has domain-part company.xx > > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > > + > > + Author email nick2@company.xx has domain-part company.xx > > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > > + > > + Author email bugs@company.xx has domain-part company.xx > > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > > + > > + Author email bugs@company.xx has domain-part company.xx > > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > > + > > + Author email author@example.com has domain-part example.com > > + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN > > + EOF > > + > > + git log --pretty=format:"Author email %ae has domain-part %aa%nCommitter email %ce has domain-part %ca%n" >actual && > > + test_cmp expect actual > > +' > > + > > test_expect_success 'Log output with --use-mailmap' ' > > test_config mailmap.file complex.map && > > > > diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh > > index 573eb97a0f7f..34c686becf2d 100755 > > --- a/t/t6006-rev-list-format.sh > > +++ b/t/t6006-rev-list-format.sh > > @@ -163,11 +163,12 @@ commit $head1 > > EOF > > > > # we don't test relative here > > -test_format author %an%n%ae%n%al%n%ad%n%aD%n%at <<EOF > > +test_format author %an%n%ae%n%al%aa%n%ad%n%aD%n%at <<EOF > > commit $head2 > > $GIT_AUTHOR_NAME > > $GIT_AUTHOR_EMAIL > > $TEST_AUTHOR_LOCALNAME > > +$TEST_AUTHOR_DOMAIN > > Thu Apr 7 15:13:13 2005 -0700 > > Thu, 7 Apr 2005 15:13:13 -0700 > > 1112911993 > > @@ -180,11 +181,12 @@ Thu, 7 Apr 2005 15:13:13 -0700 > > 1112911993 > > EOF > > > > -test_format committer %cn%n%ce%n%cl%n%cd%n%cD%n%ct <<EOF > > +test_format committer %cn%n%ce%n%cl%ca%n%cd%n%cD%n%ct <<EOF > > commit $head2 > > $GIT_COMMITTER_NAME > > $GIT_COMMITTER_EMAIL > > $TEST_COMMITTER_LOCALNAME > > +$TEST_COMMITTER_DOMAIN > > Thu Apr 7 15:13:13 2005 -0700 > > Thu, 7 Apr 2005 15:13:13 -0700 > > 1112911993 > > > > -- > > 2.39.0 > > The tests look good too. > > I should say I'm skeptical of the new format's name though. I know '%ad' is > taken... but maybe it's just me. > > Thanks I agree, %aa isn't the best, I'm definitly opened to suggestions. My preference would've been for something like %ad, but that's already taken. Thanks for reviewing. Cheers, Liam
Hi Junio, Peff, On Fri, Oct 27, 2023 at 10:13:01PM -0400, Jeff King wrote: > On Sat, Oct 28, 2023 at 09:12:06AM +0900, Junio C Hamano wrote: > > > Grouping @gmail.com addresses do not smell all that useful, though. While I agree with you, I think that's more an exception that the rule. > > More importantly, it is not clear what "Many reports" refers to. If > > they are *not* verbatim output from "git log" family of commands, > > iow, they are produced by post-processing output from "git log" > > family of commands, then I do not quite see why %aa is useful at > > all. I might've been a bit generous with "many report", I was mostly thinking of the ones published by lwn.net, and U-Boot for example. To some extent, "git shortlog" could be considered a part of that post-processing chain. > One way you could directly use this is in shortlog, which these days > lets you group by specific formats. So: > > git shortlog -ns --group=format:%aA That's exactly what I implemented this for :-) > is potentially useful. > > I say "potentially" because it really depends on your project and its > contributors. In git.git the results are mostly either too broad > ("gmail.com" covers many unrelated people) or too narrow (I'll assume > I'm the only contributor from "peff.net"). There are a few possibly > useful ones ("microsoft.com", "gitlab.com", though even those are > misleading because email domains don't always correspond to > affiliations). I agree with your comment here, while grouping everything under "gmail.com" for example doesn't provide anything really useful we can rely on mailmap to fix that when appropriate. I think it would otherwise count as unaffiliated. I don't claim this to be foolproof, but I do think that it gives a good overall view of which companies are involved in the project for the most part. > So I don't find it useful myself, but I see how it could be in the right > circumstances. It also feels like a symmetric match to "%al", which > already exists. I do find "aa" as the identifier a little hard to > remember. I guess it's "a" for "address", though I'd have called the > whole local@domain thing an address thing that. Of course "d" for domain > would make sense, but that is already taken. If we could spell it as > %(authoremail:domain) that would remove the question. But given the > existence of "%al", I'm not too sad to see another letter allocated to > this purpose in the meantime. I chose the "a" for "address", but I'm not sold on %aa either. I just couldn't find anything better that wasn't already taken. What about "a@"? It's a bit easier to remember, being the first character of the domain-part. > Just my two cents as a shortlog --format afficionado. ;) (Of course, > shortlog itself is the ultimate "you could really just post-process log > output" example). I'm a big fan of shortlog --format (and --group) as well! Taking it a step further, it's also possible to pass in whatever mailmap you want to generate a "report". Let's say there's mapping that only makes sense for a single release something like this could be used: git -c mailmap.file=git-mailmap-v2.42 shortlog -sn --group=format:%aA > -Peff Thanks for your time. Cheers, Liam
On 28/10/2023 04:22, Liam Beguin wrote: > On Fri, Oct 27, 2023 at 10:13:01PM -0400, Jeff King wrote: >> One way you could directly use this is in shortlog, which these days >> lets you group by specific formats. So: >> >> git shortlog -ns --group=format:%aA > > That's exactly what I implemented this for :-) Another potential use case is custom log formats where one might want to color the local-part separately from the domain part. >> It also feels like a symmetric match to "%al", which already exists. Speaking of symmetry, I think it would need "%c" counterparts for the coloring use case. > I chose the "a" for "address", but I'm not sold on %aa either. > I just couldn't find anything better that wasn't already taken. > > What about "a@"? Makes sense, and I suppose there's "%G?" as precedent for using a symbol rather than letter in these. If that's not suitable though, how about "m" for "mail domain"? It also immediately follows "l" for "local-part" in the alphabet. Regards, Andy
On 28/10/2023 07:58, Andy Koppe wrote: > On 28/10/2023 04:22, Liam Beguin wrote: >> On Fri, Oct 27, 2023 at 10:13:01PM -0400, Jeff King wrote: >>> One way you could directly use this is in shortlog, which these days >>> lets you group by specific formats. So: >>> >>> git shortlog -ns --group=format:%aA >> >> That's exactly what I implemented this for :-) > > Another potential use case is custom log formats where one might want to > color the local-part separately from the domain part. > >>> It also feels like a symmetric match to "%al", which already exists. > > Speaking of symmetry, I think it would need "%c" counterparts for the > coloring use case. D'oh, it always helps to read the actual patches, which do have the %c variants. Thanks, Andy
On Fri, Oct 27, 2023 at 10:20:48PM -0400, Liam Beguin wrote: >I agree, %aa isn't the best, I'm definitly opened to suggestions. >My preference would've been for something like %ad, but that's already >taken. > H for host would be available. (not to be confused with h for human.) in retrospect i'd say that it was unwise to use separate letters for the various forms of dates - a set of qualifiers that would be applied to a single specifier would be nicer. ah well ... regards
On 28/10/2023 16:27, Oswald Buddenhagen wrote: > On Fri, Oct 27, 2023 at 10:20:48PM -0400, Liam Beguin wrote: >> I agree, %aa isn't the best, I'm definitly opened to suggestions. >> My preference would've been for something like %ad, but that's already >> taken. >> > H for host would be available. (not to be confused with h for human.) Both lowercase and uppercase would be needed though, to mirror %ae/%aE and %al/%aL for choosing whether to respect .mailmap. That actually rules out the %a@ idea as well. Andy
Jeff King <peff@peff.net> writes: > On Sat, Oct 28, 2023 at 09:12:06AM +0900, Junio C Hamano wrote: > >> Grouping @gmail.com addresses do not smell all that useful, though. >> ... > One way you could directly use this is in shortlog, which these days > lets you group by specific formats. So: > > git shortlog -ns --group=format:%aA > > is potentially useful. Exactly. That is what I meant by "Grouping", and I agree with you about "potentially" part, too ;-) Throwing all @gmail.com addresses into a single bin would not be very useful. > ... If we could spell it as > %(authoremail:domain) that would remove the question. But given the > existence of "%al", I'm not too sad to see another letter allocated to > this purpose in the meantime. Another line of thought is perhaps it is potentially useful to teach the --format= machinery to be a bit more programmable, e.g. allowing to compute a substring of an existing field %{%aE#*@} without having to waste a letter each for the local part and domain part. But as I already said, we are now talking about "postprocessing", and adding complexity to our codebase only to have incomplete flexibility may not be worth it. A more specific %(authoremail:localpart) and its domain counterpart may be easier to explain and understand. In any case, it is a bit too late to say "let's not waste the precious single letter namespace to add useless features", as we have come way too far, so I do not mind too much using a currently unused letter $X for yet another author and committer trait.
On Sat, Oct 28, 2023 at 07:58:31AM +0100, Andy Koppe wrote: > > I chose the "a" for "address", but I'm not sold on %aa either. > > I just couldn't find anything better that wasn't already taken. > > > > What about "a@"? > > Makes sense, and I suppose there's "%G?" as precedent for using a symbol > rather than letter in these. This is pretty subjective, but I somehow find "%a@" hard to parse visually (despite the fact that yes, "%G?" already crossed that bridge). But I think the real nail in the coffin is your later comment that we cannot use capitalization to make the raw/mailmap distinction. > If that's not suitable though, how about "m" for "mail domain"? It also > immediately follows "l" for "local-part" in the alphabet. FWIW, that makes sense to me over "a" (though admittedly it is not really any less vague than "a", so it really might vary from person to person). -Peff
On Mon, Oct 30, 2023 at 05:10:11AM -0400, Jeff King wrote: > On Sat, Oct 28, 2023 at 07:58:31AM +0100, Andy Koppe wrote: > > > > I chose the "a" for "address", but I'm not sold on %aa either. > > > I just couldn't find anything better that wasn't already taken. > > > > > > What about "a@"? > > > > Makes sense, and I suppose there's "%G?" as precedent for using a symbol > > rather than letter in these. > > This is pretty subjective, but I somehow find "%a@" hard to parse > visually (despite the fact that yes, "%G?" already crossed that bridge). > But I think the real nail in the coffin is your later comment that we > cannot use capitalization to make the raw/mailmap distinction. > > > If that's not suitable though, how about "m" for "mail domain"? It also > > immediately follows "l" for "local-part" in the alphabet. > > FWIW, that makes sense to me over "a" (though admittedly it is not > really any less vague than "a", so it really might vary from person to > person). Okay, I like 'm' better as well. And '@' is a no go because the mailmaped version. I'll resend. Cheers, Liam
On 27/10/2023 19:40, Kousik Sanagavarapu wrote: > Liam Beguin <liambeguin@gmail.com> wrote: >> @@ -808,6 +808,17 @@ static size_t format_person_part(struct strbuf *sb, char part, >> strbuf_add(sb, mail, maillen); >> return placeholder_len; >> } >> + if (part == 'a' || part == 'A') { /* domain-part */ >> + const char *at = memchr(mail, '@', maillen); >> + if (at) { >> + at += 1; >> + maillen -= at - mail; >> + strbuf_add(sb, at, maillen); >> + } else { >> + strbuf_add(sb, mail, maillen); >> + } >> + return placeholder_len; >> + } >> >> if (!s.date_begin) >> goto skip; > > So, if we have a domain-name, we grab it, else (the case where we don't > have '@') we grab it as-is. Looks good. I'm not sure that this is the right way to handle a missing '@' here actually, because %al already returns the whole email field in that case, which makes sense as the likes of the 'mail' command would interpret it as a local username. And if someone was going to use %al and the new specifier together to format the parts of the email field differently, they probably wouldn't want the field to appear twice. Therefore I think it would be more appropriate to expand to nothing in that case. Tools that consume this output would already need to be able to deal with the empty case, as it could also happen if there's a single '@' at the end of the email field, or if the field is empty. Regards, Andy
On Fri, Nov 03, 2023 at 08:22:05AM +0000, Andy Koppe wrote: > > On 27/10/2023 19:40, Kousik Sanagavarapu wrote: > > So, if we have a domain-name, we grab it, else (the case where we don't > > have '@') we grab it as-is. Looks good. > > I'm not sure that this is the right way to handle a missing '@' here > actually, because %al already returns the whole email field in that case, > which makes sense as the likes of the 'mail' command would interpret it as a > local username. > > And if someone was going to use %al and the new specifier together to format > the parts of the email field differently, they probably wouldn't want the > field to appear twice. > > Therefore I think it would be more appropriate to expand to nothing in that > case. Tools that consume this output would already need to be able to deal > with the empty case, as it could also happen if there's a single '@' at the > end of the email field, or if the field is empty. I originally thought since localpart and the new domainpart are like counterparts (or are symmetrical, like Peff mentioned), falling back like in the case of localpart was the correct way here (again, symmetry). Having read your reasoning though I think it makes sense to not fall back the same way as localpart (%al) and return empty instead. This goes in favor of what Liam said in the original commit message of this feature being used to keep track of commits from different organizations (as in from GitHub or GitLab) or at least domains that make sense. As, if we return the whole email (where the email is with no domain-part or '@' is at the end of the email), the whole purpose of our new feature is lost. Thanks
Andy Koppe <andy.koppe@gmail.com> writes: > I'm not sure that this is the right way to handle a missing '@' here > actually, because %al already returns the whole email field in that > case, which makes sense as the likes of the 'mail' command would > interpret it as a local username. We could expand "%am" to \C-h (\010) so that "%al@%am" would end up displaying the same as "%al" but that would be way too cute for its own worth ;-) It is unfortunate that "%al@%am" cannot be the same as "%ae" for local-only address, but giving an empty string for "%am" if "%ae" is local-only would be the best we could do for our users, and certainly much better than giving the same as "%ae", as you said above. Thanks.
On 04/11/2023 01:54, Junio C Hamano wrote: > Andy Koppe <andy.koppe@gmail.com> writes: > >> I'm not sure that this is the right way to handle a missing '@' here >> actually, because %al already returns the whole email field in that >> case, which makes sense as the likes of the 'mail' command would >> interpret it as a local username. > > We could expand "%am" to \C-h (\010) so that "%al@%am" would end up > displaying the same as "%al" but that would be way too cute for its > own worth ;-) :) Unfortunately it also wouldn't always work, because ^H only moves the cursor, so if the next thing is a newline, the '@' wouldn't actually get deleted. > It is unfortunate that "%al@%am" cannot be the same as "%ae" for > local-only address, but giving an empty string for "%am" if "%ae" is > local-only would be the best we could do for our users, and certainly > much better than giving the same as "%ae", as you said above. I suppose "%@am" could mean prepending an '@' when a domain is present, similar to how "% am" would mean prepending a space and "%+am" would mean prepending a newline. With that, "%al%@am" would be equivalent to "%ae". But that then raises the question whether it should be implemented just for "%@[ac][mM]", or for all placeholders. In any case, I don't think it needs to be part of the changes at hand. Andy
Junio C Hamano <gitster@pobox.com> writes: > Another line of thought is perhaps it is potentially useful to teach > the --format= machinery to be a bit more programmable, e.g. allowing > to compute a substring of an existing field %{%aE#*@} without having > to waste a letter each for the local part and domain part. But as I > already said, we are now talking about "postprocessing", and adding > complexity to our codebase only to have incomplete flexibility may > not be worth it. A more specific %(authoremail:localpart) and its > domain counterpart may be easier to explain and understand. > > In any case, it is a bit too late to say "let's not waste the > precious single letter namespace to add useless features", as we > have come way too far, so I do not mind too much using a currently > unused letter $X for yet another author and committer trait. When I wrote the above, I somehow forgot the existing work in the ref-filter (aka "for-each-ref") placeholders, where we have support to a lot more flexible way to customize these things. For example, "%(authoremail:mailmap,localpart)" can be used to say, instead of wasting two letters 'l' and 'L' out of precious 52, that we want e-mail address honoring the mailmap, and take only the local part. And the support for the host part of the address that this topic discussed should be implementable fairly easily (just adding EO_HOSTPART bit to the email_option structure would be sufficient) on the ref-filter side. We saw efforts from time to time to give "log --pretty=format:" more of the good things from the "for-each-ref --format=" placeholders (and vice versa), and it may give us a good way forward.
Hi Junio, Apologies for the late reply. On Tue, Nov 21, 2023 at 05:21:43AM +0900, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > Another line of thought is perhaps it is potentially useful to teach > > the --format= machinery to be a bit more programmable, e.g. allowing > > to compute a substring of an existing field %{%aE#*@} without having > > to waste a letter each for the local part and domain part. But as I > > already said, we are now talking about "postprocessing", and adding > > complexity to our codebase only to have incomplete flexibility may > > not be worth it. A more specific %(authoremail:localpart) and its > > domain counterpart may be easier to explain and understand. > > > > In any case, it is a bit too late to say "let's not waste the > > precious single letter namespace to add useless features", as we > > have come way too far, so I do not mind too much using a currently > > unused letter $X for yet another author and committer trait. > > When I wrote the above, I somehow forgot the existing work in the > ref-filter (aka "for-each-ref") placeholders, where we have support > to a lot more flexible way to customize these things. I looked into this a little, after your first email. I'll try to make time to have another look. > For example, "%(authoremail:mailmap,localpart)" can be used to say, > instead of wasting two letters 'l' and 'L' out of precious 52, that > we want e-mail address honoring the mailmap, and take only the local > part. And the support for the host part of the address that this > topic discussed should be implementable fairly easily (just adding > EO_HOSTPART bit to the email_option structure would be sufficient) > on the ref-filter side. > > We saw efforts from time to time to give "log --pretty=format:" more > of the good things from the "for-each-ref --format=" placeholders > (and vice versa), and it may give us a good way forward. This definitely sounds like a better approach than wasting two more letters. Liam
diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt index a22f6fceecdd..72102a681c3a 100644 --- a/Documentation/pretty-formats.txt +++ b/Documentation/pretty-formats.txt @@ -195,6 +195,9 @@ The placeholders are: '%al':: author email local-part (the part before the '@' sign) '%aL':: author email local-part (see '%al') respecting .mailmap, see linkgit:git-shortlog[1] or linkgit:git-blame[1]) +'%aa':: author email domain-part (the part after the '@' sign) +'%aA':: author email domain-part (see '%al') respecting .mailmap, see + linkgit:git-shortlog[1] or linkgit:git-blame[1]) '%ad':: author date (format respects --date= option) '%aD':: author date, RFC2822 style '%ar':: author date, relative @@ -213,6 +216,9 @@ The placeholders are: '%cl':: committer email local-part (the part before the '@' sign) '%cL':: committer email local-part (see '%cl') respecting .mailmap, see linkgit:git-shortlog[1] or linkgit:git-blame[1]) +'%ca':: committer email domain-part (the part before the '@' sign) +'%cA':: committer email domain-part (see '%cl') respecting .mailmap, see + linkgit:git-shortlog[1] or linkgit:git-blame[1]) '%cd':: committer date (format respects --date= option) '%cD':: committer date, RFC2822 style '%cr':: committer date, relative diff --git a/pretty.c b/pretty.c index cf964b060cd1..4f5d081589ea 100644 --- a/pretty.c +++ b/pretty.c @@ -791,7 +791,7 @@ static size_t format_person_part(struct strbuf *sb, char part, mail = s.mail_begin; maillen = s.mail_end - s.mail_begin; - if (part == 'N' || part == 'E' || part == 'L') /* mailmap lookup */ + if (part == 'N' || part == 'E' || part == 'L' || part == 'A') /* mailmap lookup */ mailmap_name(&mail, &maillen, &name, &namelen); if (part == 'n' || part == 'N') { /* name */ strbuf_add(sb, name, namelen); @@ -808,6 +808,17 @@ static size_t format_person_part(struct strbuf *sb, char part, strbuf_add(sb, mail, maillen); return placeholder_len; } + if (part == 'a' || part == 'A') { /* domain-part */ + const char *at = memchr(mail, '@', maillen); + if (at) { + at += 1; + maillen -= at - mail; + strbuf_add(sb, at, maillen); + } else { + strbuf_add(sb, mail, maillen); + } + return placeholder_len; + } if (!s.date_begin) goto skip; diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh index 2016132f5161..35bf7bb05bea 100755 --- a/t/t4203-mailmap.sh +++ b/t/t4203-mailmap.sh @@ -624,6 +624,34 @@ test_expect_success 'Log output (local-part email address)' ' test_cmp expect actual ' +test_expect_success 'Log output (domain-part email address)' ' + cat >expect <<-EOF && + Author email cto@coompany.xx has domain-part coompany.xx + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN + + Author email me@company.xx has domain-part company.xx + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN + + Author email me@company.xx has domain-part company.xx + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN + + Author email nick2@company.xx has domain-part company.xx + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN + + Author email bugs@company.xx has domain-part company.xx + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN + + Author email bugs@company.xx has domain-part company.xx + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN + + Author email author@example.com has domain-part example.com + Committer email $GIT_COMMITTER_EMAIL has domain-part $TEST_COMMITTER_DOMAIN + EOF + + git log --pretty=format:"Author email %ae has domain-part %aa%nCommitter email %ce has domain-part %ca%n" >actual && + test_cmp expect actual +' + test_expect_success 'Log output with --use-mailmap' ' test_config mailmap.file complex.map && diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh index 573eb97a0f7f..34c686becf2d 100755 --- a/t/t6006-rev-list-format.sh +++ b/t/t6006-rev-list-format.sh @@ -163,11 +163,12 @@ commit $head1 EOF # we don't test relative here -test_format author %an%n%ae%n%al%n%ad%n%aD%n%at <<EOF +test_format author %an%n%ae%n%al%aa%n%ad%n%aD%n%at <<EOF commit $head2 $GIT_AUTHOR_NAME $GIT_AUTHOR_EMAIL $TEST_AUTHOR_LOCALNAME +$TEST_AUTHOR_DOMAIN Thu Apr 7 15:13:13 2005 -0700 Thu, 7 Apr 2005 15:13:13 -0700 1112911993 @@ -180,11 +181,12 @@ Thu, 7 Apr 2005 15:13:13 -0700 1112911993 EOF -test_format committer %cn%n%ce%n%cl%n%cd%n%cD%n%ct <<EOF +test_format committer %cn%n%ce%n%cl%ca%n%cd%n%cD%n%ct <<EOF commit $head2 $GIT_COMMITTER_NAME $GIT_COMMITTER_EMAIL $TEST_COMMITTER_LOCALNAME +$TEST_COMMITTER_DOMAIN Thu Apr 7 15:13:13 2005 -0700 Thu, 7 Apr 2005 15:13:13 -0700 1112911993
Many reports use the email domain to keep track of organizations contributing to projects. Add support for formatting the domain-part of a contributor's address so that this can be done using git itself, with something like: git shortlog -sn --group=format:%aA v2.41.0..v2.42.0 Signed-off-by: Liam Beguin <liambeguin@gmail.com> --- Documentation/pretty-formats.txt | 6 ++++++ pretty.c | 13 ++++++++++++- t/t4203-mailmap.sh | 28 ++++++++++++++++++++++++++++ t/t6006-rev-list-format.sh | 6 ++++-- 4 files changed, 50 insertions(+), 3 deletions(-)