diff mbox series

shortlog: pass the mailmap into the revision walker

Message ID 20181212171052.13415-1-cb@hashpling.org (mailing list archive)
State New, archived
Headers show
Series shortlog: pass the mailmap into the revision walker | expand

Commit Message

CB Bailey Dec. 12, 2018, 5:10 p.m. UTC
From: CB Bailey <cbailey32@bloomberg.net>

shortlog always respects the mailmap in its output. Pass the mailmap
into the revision walker to allow the mailmap to be used with revision
limiting options such as '--author'.

This removes some apparently inconsistent behaviors when using
'--author', such as not finding some or all commits for a given author
which do appear under that author in an unrestricted invocation of
shortlog or commits being summarized under a different author than the
specified author.

Signed-off-by: CB Bailey <cbailey32@bloomberg.net>
---

Resending with omitted s-o-b.

 builtin/shortlog.c |  2 ++
 t/t4203-mailmap.sh | 28 ++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

Comments

Ævar Arnfjörð Bjarmason Dec. 12, 2018, 6:31 p.m. UTC | #1
On Wed, Dec 12 2018, CB Bailey wrote:

> From: CB Bailey <cbailey32@bloomberg.net>
>
> shortlog always respects the mailmap in its output. Pass the mailmap
> into the revision walker to allow the mailmap to be used with revision
> limiting options such as '--author'.
>
> This removes some apparently inconsistent behaviors when using
> '--author', such as not finding some or all commits for a given author
> which do appear under that author in an unrestricted invocation of
> shortlog or commits being summarized under a different author than the
> specified author.
>
> Signed-off-by: CB Bailey <cbailey32@bloomberg.net>
> ---
>
> Resending with omitted s-o-b.
>
>  builtin/shortlog.c |  2 ++
>  t/t4203-mailmap.sh | 28 ++++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
>
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index 88f88e97b2..a6fb00ade8 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -188,6 +188,8 @@ static void get_from_rev(struct rev_info *rev, struct shortlog *log)
>  {
>  	struct commit *commit;
>
> +	rev->mailmap = &log->mailmap;
> +
>  	if (prepare_revision_walk(rev))
>  		die(_("revision walk setup failed"));
>  	while ((commit = get_revision(rev)) != NULL)
> diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
> index 43b1522ea2..9bee35b06c 100755
> --- a/t/t4203-mailmap.sh
> +++ b/t/t4203-mailmap.sh
> @@ -383,6 +383,34 @@ test_expect_success 'Shortlog output (complex mapping)' '
>
>  '
>
> +test_expect_success 'Shortlog output (complex mapping, filtered)' '
> +
> +	printf "     1\tA U Thor <author@example.com>\n" >expect &&
> +
> +	git shortlog -es --author="A U Thor" HEAD >actual &&
> +	test_cmp expect actual &&
> +
> +	printf "     1\tCTO <cto@company.xx>\n" >expect &&
> +
> +	git shortlog -es --author=CTO HEAD >actual &&
> +	test_cmp expect actual &&
> +
> +	printf "     2\tOther Author <other@author.xx>\n" >expect &&
> +
> +	git shortlog -es --author="Other Author" HEAD >actual &&
> +	test_cmp expect actual &&
> +
> +	printf "     2\tSanta Claus <santa.claus@northpole.xx>\n" >expect &&
> +
> +	git shortlog -es --author="Santa Claus" HEAD >actual &&
> +	test_cmp expect actual &&
> +
> +	printf "     1\tSome Dude <some@dude.xx>\n" >expect &&
> +
> +	git shortlog -es --author="Some Dude" HEAD >actual &&
> +	test_cmp expect actual
> +'
> +
>  # git log with --pretty format which uses the name and email mailmap placemarkers
>  cat >expect <<\EOF
>  Author CTO <cto@coompany.xx> maps to CTO <cto@company.xx>

Makes sense. Not saying this is how it should be, just an equivalently
working patch that I came up with on top while poing at this:

    diff --git a/builtin/shortlog.c b/builtin/shortlog.c
    index a6fb00ade8..ad84d70d07 100644
    --- a/builtin/shortlog.c
    +++ b/builtin/shortlog.c
    @@ -188,10 +188,9 @@ static void get_from_rev(struct rev_info *rev, struct shortlog *log)
     {
     	struct commit *commit;

    -	rev->mailmap = &log->mailmap;
    -
     	if (prepare_revision_walk(rev))
     		die(_("revision walk setup failed"));
    +	rev->mailmap = &log->mailmap;
     	while ((commit = get_revision(rev)) != NULL)
     		shortlog_add_commit(log, commit);
     }
    diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
    index 9bee35b06c..74a269052d 100755
    --- a/t/t4203-mailmap.sh
    +++ b/t/t4203-mailmap.sh
    @@ -384,17 +384,6 @@ test_expect_success 'Shortlog output (complex mapping)' '
     '

     test_expect_success 'Shortlog output (complex mapping, filtered)' '
    -
    -	printf "     1\tA U Thor <author@example.com>\n" >expect &&
    -
    -	git shortlog -es --author="A U Thor" HEAD >actual &&
    -	test_cmp expect actual &&
    -
    -	printf "     1\tCTO <cto@company.xx>\n" >expect &&
    -
    -	git shortlog -es --author=CTO HEAD >actual &&
    -	test_cmp expect actual &&
    -
     	printf "     2\tOther Author <other@author.xx>\n" >expect &&

     	git shortlog -es --author="Other Author" HEAD >actual &&

I.e. we just need the assignment after prepare_revision_walk() and the
first 2x tests were things that passed before this change.

So that's not a "let's squash that on top" but "I was poking at this and
here's stuff that I fiddled with or surprised me slightly".
Junio C Hamano Dec. 12, 2018, 7:19 p.m. UTC | #2
CB Bailey <cb@hashpling.org> writes:

> From: CB Bailey <cbailey32@bloomberg.net>
>
> shortlog always respects the mailmap in its output. Pass the mailmap
> into the revision walker to allow the mailmap to be used with revision
> limiting options such as '--author'.

I am a bit torn on this.  

If an author of interest has entries in the mailmap, with the same
name but three or more different e-mail addresses, you could do

    $ git shortlog --author=junio@kernel.org --author=junio@twinsun.com

to find commits by the author under these two addresses, and exclude
commits by the same author made under other addresses.  With this
patch applied, it becomes no longer possible, and you can only look
for commits written under all of my identities, or none of them,
which is a minor annoyance.

But more importantly, the user now needs to know which one is the
"canonical" spelling of the author's ident [*1*].  Asking for
commits written by junio@kernel.org will not yield any result with
the patch applied, no?

For the author in the above example, luckily, AUTHOR_NAME is the
same and unique, so without your patch, you could still do

    $ git shortlog --author="Junio C"

to grab all of them, but there may be authors for whom there is no
catch-all substring that would match all of the idents they ever
used but still unique enough to match only that author, and I do
agree that the new feature proposed by the patch in question would
have uses in such a case.

Which nudges me to say that there needs a way to ask for the
original behaviour, disabling the rewriting of commit author
identity before --author filter kicks in.

There may also need a way to ask omitting mailmap processing even at
the output phase.  I do not think it is a particularly interesting
feature to be able to ask for my commits under only two of my
identities [*2*], but as long as that "feature" exists, it also
should be possible to see the result of that "feature" more clearly
by not merging them into a single list.  "shortlog --no-use-mailmap"
may be a way to do so, but unlike "log", the command currently does
not take the "--[no-]use-mailmap" option.


[Footnote]

*1* Passing the query string given as --author=... through the
    mailmap machinery _might_ be a possible ingredient to solve "the
    user needs to know the caonical spelling of the author ident"
    issue this patch has, but I do not think it generally is
    possible (e.g. how would you rewrite --author='.*@kernel.org').

*2* In other words, I think the loss of "find only my @kernel.org
    and @twinsun.com commits" is mere annoyance and people can live
    with it.  But I think "you must look for gitster@pobox.com, and
    looking for junio@kernel.org will not find my commits made under
    that identity" is a show stopper.  That is why I use a very weak
    "may also need" to this "optionally not using mailmap at all"
    feature.
diff mbox series

Patch

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 88f88e97b2..a6fb00ade8 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -188,6 +188,8 @@  static void get_from_rev(struct rev_info *rev, struct shortlog *log)
 {
 	struct commit *commit;
 
+	rev->mailmap = &log->mailmap;
+
 	if (prepare_revision_walk(rev))
 		die(_("revision walk setup failed"));
 	while ((commit = get_revision(rev)) != NULL)
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 43b1522ea2..9bee35b06c 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -383,6 +383,34 @@  test_expect_success 'Shortlog output (complex mapping)' '
 
 '
 
+test_expect_success 'Shortlog output (complex mapping, filtered)' '
+
+	printf "     1\tA U Thor <author@example.com>\n" >expect &&
+
+	git shortlog -es --author="A U Thor" HEAD >actual &&
+	test_cmp expect actual &&
+
+	printf "     1\tCTO <cto@company.xx>\n" >expect &&
+
+	git shortlog -es --author=CTO HEAD >actual &&
+	test_cmp expect actual &&
+
+	printf "     2\tOther Author <other@author.xx>\n" >expect &&
+
+	git shortlog -es --author="Other Author" HEAD >actual &&
+	test_cmp expect actual &&
+
+	printf "     2\tSanta Claus <santa.claus@northpole.xx>\n" >expect &&
+
+	git shortlog -es --author="Santa Claus" HEAD >actual &&
+	test_cmp expect actual &&
+
+	printf "     1\tSome Dude <some@dude.xx>\n" >expect &&
+
+	git shortlog -es --author="Some Dude" HEAD >actual &&
+	test_cmp expect actual
+'
+
 # git log with --pretty format which uses the name and email mailmap placemarkers
 cat >expect <<\EOF
 Author CTO <cto@coompany.xx> maps to CTO <cto@company.xx>