diff mbox series

[5/7] shortlog: implement `--group=author` in terms of `--group=<format>`

Message ID 55a6ef7bc0082818fa51a0915c43002ede5c449f.1665448437.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series shortlog: introduce `--group=<format>` | expand

Commit Message

Taylor Blau Oct. 11, 2022, 12:34 a.m. UTC
Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as
a special case of the new `--group=<format>` mode, where the author mode
is a shorthand for `--group='%aN <%aE>'.

Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it
has a different meaning in `read_from_stdin()`, where it is still used
for a different purpose.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/log.c      |  1 +
 builtin/shortlog.c | 23 ++++++++++++-----------
 shortlog.h         |  1 +
 3 files changed, 14 insertions(+), 11 deletions(-)

Comments

Jeff King Oct. 11, 2022, 1:34 a.m. UTC | #1
On Mon, Oct 10, 2022 at 08:34:15PM -0400, Taylor Blau wrote:

> Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as
> a special case of the new `--group=<format>` mode, where the author mode
> is a shorthand for `--group='%aN <%aE>'.

OK, this should be a nice cleanup.

> diff --git a/builtin/log.c b/builtin/log.c
> index ee19dc5d45..6b77e520b5 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -1334,6 +1334,7 @@ static void make_cover_letter(struct rev_info *rev, int use_separate_file,
>  	log.in2 = 4;
>  	log.file = rev->diffopt.file;
>  	log.groups = SHORTLOG_GROUP_AUTHOR;
> +	shortlog_init_group(&log);
>  	for (i = 0; i < nr; i++)
>  		shortlog_add_commit(&log, list[i]);

In another caller you drop the assignment of log.groups, since
shortlog_init_group() already does so if log.groups is 0 (which it will
be, since shortlog_init() will zero-initialize).

Should we do the same here? Or maybe leaving it is more obvious. It
would be more obvious still if we made the helper take the type, like:

  shortlog_init_group(&log, SHORTLOG_GROUP_AUTHOR);

But I guess that is not accurate, as we'd eventually use this in
shortlog.c to turn _any_ bits we've accumulated in log.group into their
correct formats.

I think the name of the helper function puzzled me a bit. It is really
"finish up any setup for the shortlog struct". We could lazily do it in
shortlog_add_commit(), but that's pretty subtle. But could we maybe call
it:

  shortlog_finish_setup();

or something? I dunno. I might be nit-picking, but I actually had to
scratch my head quite a bit to understand what was going on here.

> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index f708d96558..aac8c7afa4 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
>  	}
>  	oneline_str = oneline.len ? oneline.buf : "<none>";
>  
> -	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
> -		strbuf_reset(&ident);
> -		format_commit_message(commit,
> -				      log->email ? "%aN <%aE>" : "%aN",
> -				      &ident, &ctx);
> -		if (!HAS_MULTI_BITS(log->groups) ||
> -		    strset_add(&dups, ident.buf))
> -			insert_one_record(log, ident.buf, oneline_str);
> -	}

This loses the HAS_MULTI_BITS() optimization. The idea there is that if
you have a single group-by that can't produce multiple outputs, then
there's no need to do duplicate detection.

The equivalent in an all-formats world is something like:

  log.format.nr > 1 && !log.trailers.nr

(because trailers are special in that one trailer key can produce
multiple idents for a single commit).

> +void shortlog_init_group(struct shortlog *log)
> +{
> +	if (!log->groups)
> +		log->groups = SHORTLOG_GROUP_AUTHOR;
> +
> +	if (log->groups & SHORTLOG_GROUP_AUTHOR)
> +		string_list_append(&log->format,
> +				   log->email ? "%aN <%aE>" : "%aN");
> +}

Regardless of the naming suggestion I made, I think things would be more
obvious if this top conditional remained in cmd_shortlog(). And then the
explicit assignment of "log.group" in make_cover_letter() would remain.

> @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
>  	log.file = rev.diffopt.file;
>  	log.date_mode = rev.date_mode;
>  
> -	if (!log.groups)
> -		log.groups = SHORTLOG_GROUP_AUTHOR;
> +	shortlog_init_group(&log);
> +
>  	string_list_sort(&log.trailers);

Now that we have a "finish the shortlog init" function, probably this
trailer sort should go into it, too. The current caller doesn't need it,
but it removes on more gotcha from using the shortlog API.

-Peff
Taylor Blau Oct. 11, 2022, 1:48 a.m. UTC | #2
On Mon, Oct 10, 2022 at 09:34:42PM -0400, Jeff King wrote:
> On Mon, Oct 10, 2022 at 08:34:15PM -0400, Taylor Blau wrote:
>
> I think the name of the helper function puzzled me a bit. It is really
> "finish up any setup for the shortlog struct". We could lazily do it in
> shortlog_add_commit(), but that's pretty subtle. But could we maybe call
> it:
>
>   shortlog_finish_setup();
>
> or something? I dunno. I might be nit-picking, but I actually had to
> scratch my head quite a bit to understand what was going on here.

I think that shortlog_finish_setup() is probably the most reasonable
choice (and I changed it to that locally). I spent a day or so thinking
on and off about this while writing the series, and didn't come up with
any satisfying names.

I think it points to something deeper about the API that doesn't quite
sit right with me. But shortlog_finish_setup() is the least-unsatisfying
name so far, so let's go with that ;-).

> > diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> > index f708d96558..aac8c7afa4 100644
> > --- a/builtin/shortlog.c
> > +++ b/builtin/shortlog.c
> > @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
> >  	}
> >  	oneline_str = oneline.len ? oneline.buf : "<none>";
> >
> > -	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
> > -		strbuf_reset(&ident);
> > -		format_commit_message(commit,
> > -				      log->email ? "%aN <%aE>" : "%aN",
> > -				      &ident, &ctx);
> > -		if (!HAS_MULTI_BITS(log->groups) ||
> > -		    strset_add(&dups, ident.buf))
> > -			insert_one_record(log, ident.buf, oneline_str);
> > -	}
>
> This loses the HAS_MULTI_BITS() optimization. The idea there is that if
> you have a single group-by that can't produce multiple outputs, then
> there's no need to do duplicate detection.
>
> The equivalent in an all-formats world is something like:
>
>   log.format.nr > 1 && !log.trailers.nr
>
> (because trailers are special in that one trailer key can produce
> multiple idents for a single commit).

Hmm. I suspect that this is going to become more complicated by the time
that you read the final patch ;-). Let's wait until then and see what
you think.

> > +void shortlog_init_group(struct shortlog *log)
> > +{
> > +	if (!log->groups)
> > +		log->groups = SHORTLOG_GROUP_AUTHOR;
> > +
> > +	if (log->groups & SHORTLOG_GROUP_AUTHOR)
> > +		string_list_append(&log->format,
> > +				   log->email ? "%aN <%aE>" : "%aN");
> > +}
>
> Regardless of the naming suggestion I made, I think things would be more
> obvious if this top conditional remained in cmd_shortlog(). And then the
> explicit assignment of "log.group" in make_cover_letter() would remain.

Yep, works for me.

> > @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
> >  	log.file = rev.diffopt.file;
> >  	log.date_mode = rev.date_mode;
> >
> > -	if (!log.groups)
> > -		log.groups = SHORTLOG_GROUP_AUTHOR;
> > +	shortlog_init_group(&log);
> > +
> >  	string_list_sort(&log.trailers);
>
> Now that we have a "finish the shortlog init" function, probably this
> trailer sort should go into it, too. The current caller doesn't need it,
> but it removes on more gotcha from using the shortlog API.

We'll drop this list by the end of the series, too, so it probably isn't
worth moving it into shortlog_finish_setup() in the interim.

> -Peff

Thanks,
Taylor
Jeff King Oct. 11, 2022, 2:15 a.m. UTC | #3
On Mon, Oct 10, 2022 at 09:48:47PM -0400, Taylor Blau wrote:

> I think that shortlog_finish_setup() is probably the most reasonable
> choice (and I changed it to that locally). I spent a day or so thinking
> on and off about this while writing the series, and didn't come up with
> any satisfying names.
> 
> I think it points to something deeper about the API that doesn't quite
> sit right with me. But shortlog_finish_setup() is the least-unsatisfying
> name so far, so let's go with that ;-).

Yeah, touching that block in make_cover_letter() definitely tickled my
memory that there was some awkwardness there. That's when I added
shortlog_init() at all (which is good, because otherwise you'd have had
an uninitialized log.format string-list).

I think the general pattern of:

  foo_init();
  foo.option = whatever;
  foo_finish_setup();

  foo_do_the_thing();

is OK. It's a little complicated, but it gives callers the freedom to
tweak options with a lot of flexibility (e.g., based on command line
config, command line options, etc). We have a similar pattern with
diff_setup_done().

The other option is for any option-tweaking to go through methods which
maintain invariants (like if GROUP_AUTHOR is set, then the "%an" format
has been added). That's usually the "right" object-oriented way to do
it. But I think even in this simple case it gets awkward, because the
correct format can't be known until you see whether log->email is set.
So it really has to be a "finalize" step after both log->email and
log->group are set.

> > This loses the HAS_MULTI_BITS() optimization. The idea there is that if
> > you have a single group-by that can't produce multiple outputs, then
> > there's no need to do duplicate detection.
> >
> > The equivalent in an all-formats world is something like:
> >
> >   log.format.nr > 1 && !log.trailers.nr
> >
> > (because trailers are special in that one trailer key can produce
> > multiple idents for a single commit).
> 
> Hmm. I suspect that this is going to become more complicated by the time
> that you read the final patch ;-). Let's wait until then and see what
> you think.

Yes, it's pretty gross with the trailer util thing. You'd probably want
to do something like:

  static int needs_dedup(const struct string_list *formats)
  {
	struct string_list_item *item;
	if (formats->nr > 1)
		return 1;
	for_each_string_list_item(item, formats) {
		if (item->util)
			return 1;
	}
	return 0;
  }

and perhaps call it only once and cache the result, rather than
iterating for each commit/format.

But if we leave trailers as is, then the logic is much easier. And
implementing the optimization for --group=format should fall out pretty
naturally (and that both preserves it for --group=author, and extends it
to --group=format, which I should have noticed when reviewing patch 4).

> > > @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
> > >  	log.file = rev.diffopt.file;
> > >  	log.date_mode = rev.date_mode;
> > >
> > > -	if (!log.groups)
> > > -		log.groups = SHORTLOG_GROUP_AUTHOR;
> > > +	shortlog_init_group(&log);
> > > +
> > >  	string_list_sort(&log.trailers);
> >
> > Now that we have a "finish the shortlog init" function, probably this
> > trailer sort should go into it, too. The current caller doesn't need it,
> > but it removes on more gotcha from using the shortlog API.
> 
> We'll drop this list by the end of the series, too, so it probably isn't
> worth moving it into shortlog_finish_setup() in the interim.

Ah, right. Well, see my other comments. :)

-Peff
Ævar Arnfjörð Bjarmason Oct. 11, 2022, 10:57 a.m. UTC | #4
On Mon, Oct 10 2022, Taylor Blau wrote:

> Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as
> a special case of the new `--group=<format>` mode, where the author mode
> is a shorthand for `--group='%aN <%aE>'.
>
> Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it
> has a different meaning in `read_from_stdin()`, where it is still used
> for a different purpose.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/log.c      |  1 +
>  builtin/shortlog.c | 23 ++++++++++++-----------
>  shortlog.h         |  1 +
>  3 files changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/builtin/log.c b/builtin/log.c
> index ee19dc5d45..6b77e520b5 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -1334,6 +1334,7 @@ static void make_cover_letter(struct rev_info *rev, int use_separate_file,
>  	log.in2 = 4;
>  	log.file = rev->diffopt.file;
>  	log.groups = SHORTLOG_GROUP_AUTHOR;
> +	shortlog_init_group(&log);
>  	for (i = 0; i < nr; i++)
>  		shortlog_add_commit(&log, list[i]);
>  
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index f708d96558..aac8c7afa4 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
>  	}
>  	oneline_str = oneline.len ? oneline.buf : "<none>";
>  
> -	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
> -		strbuf_reset(&ident);
> -		format_commit_message(commit,
> -				      log->email ? "%aN <%aE>" : "%aN",
> -				      &ident, &ctx);
> -		if (!HAS_MULTI_BITS(log->groups) ||
> -		    strset_add(&dups, ident.buf))
> -			insert_one_record(log, ident.buf, oneline_str);
> -	}
>  	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
>  		strbuf_reset(&ident);
>  		format_commit_message(commit,
> @@ -372,6 +363,16 @@ void shortlog_init(struct shortlog *log)
>  	log->format.strdup_strings = 1;
>  }
>  
> +void shortlog_init_group(struct shortlog *log)
> +{
> +	if (!log->groups)
> +		log->groups = SHORTLOG_GROUP_AUTHOR;
> +
> +	if (log->groups & SHORTLOG_GROUP_AUTHOR)

Nit (easier reading):

	if (!x)
	...
	else if (x & FLAG)
	...
Ævar Arnfjörð Bjarmason Oct. 11, 2022, 11:07 a.m. UTC | #5
On Tue, Oct 11 2022, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Oct 10 2022, Taylor Blau wrote:

>> +void shortlog_init_group(struct shortlog *log)
>> +{
>> +	if (!log->groups)
>> +		log->groups = SHORTLOG_GROUP_AUTHOR;
>> +
>> +	if (log->groups & SHORTLOG_GROUP_AUTHOR)
>
> Nit (easier reading):
>
> 	if (!x)
> 	...
> 	else if (x & FLAG)
> 	...

Urgh, sorry about the noise. That's an obvious logic error, I don't know
what I was thinking...
Taylor Blau Oct. 21, 2022, 2:02 a.m. UTC | #6
On Mon, Oct 10, 2022 at 09:48:47PM -0400, Taylor Blau wrote:
> > > diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> > > index f708d96558..aac8c7afa4 100644
> > > --- a/builtin/shortlog.c
> > > +++ b/builtin/shortlog.c
> > > @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
> > >  	}
> > >  	oneline_str = oneline.len ? oneline.buf : "<none>";
> > >
> > > -	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
> > > -		strbuf_reset(&ident);
> > > -		format_commit_message(commit,
> > > -				      log->email ? "%aN <%aE>" : "%aN",
> > > -				      &ident, &ctx);
> > > -		if (!HAS_MULTI_BITS(log->groups) ||
> > > -		    strset_add(&dups, ident.buf))
> > > -			insert_one_record(log, ident.buf, oneline_str);
> > > -	}
> >
> > This loses the HAS_MULTI_BITS() optimization. The idea there is that if
> > you have a single group-by that can't produce multiple outputs, then
> > there's no need to do duplicate detection.
> >
> > The equivalent in an all-formats world is something like:
> >
> >   log.format.nr > 1 && !log.trailers.nr
> >
> > (because trailers are special in that one trailer key can produce
> > multiple idents for a single commit).

Hmm. Shouldn't that "&& !log.trailers.nr" be an "|| log.trailers.nr"? It
doesn't seem to make sense to say "there are things that could produce
multiple outputs" if there's more than one format _and_ no trailers.

The logic should read "there are things that could produce multiple
outputs if there is more than one format *or* at least one trailer".

So I think the right change would be:

--- >8 ---
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 95ceab7649..7e1b56e2aa 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -216,7 +216,8 @@ static void insert_records_from_format(struct shortlog *log,

 		format_commit_message(commit, item->string, &buf, ctx);

-		if (strset_add(dups, buf.buf))
+		if (!(log->format.nr > 1 || log->trailers.nr) ||
+		    strset_add(dups, buf.buf))
 			insert_one_record(log, buf.buf, oneline);
 	}
--- 8< ---

Yeah?

Thanks,
Taylor
Taylor Blau Oct. 21, 2022, 2:03 a.m. UTC | #7
On Mon, Oct 10, 2022 at 10:15:32PM -0400, Jeff King wrote:
> > > > @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
> > > >  	log.file = rev.diffopt.file;
> > > >  	log.date_mode = rev.date_mode;
> > > >
> > > > -	if (!log.groups)
> > > > -		log.groups = SHORTLOG_GROUP_AUTHOR;
> > > > +	shortlog_init_group(&log);
> > > > +
> > > >  	string_list_sort(&log.trailers);
> > >
> > > Now that we have a "finish the shortlog init" function, probably this
> > > trailer sort should go into it, too. The current caller doesn't need it,
> > > but it removes on more gotcha from using the shortlog API.
> >
> > We'll drop this list by the end of the series, too, so it probably isn't
> > worth moving it into shortlog_finish_setup() in the interim.
>
> Ah, right. Well, see my other comments. :)

Seen ;-). Now that this series no longer touches the log.trailers bits,
we should move this around. Done locally, will send a reroll shortly or
tomorrow.

Thanks,
Taylor
Jeff King Oct. 21, 2022, 5:03 a.m. UTC | #8
On Thu, Oct 20, 2022 at 10:02:45PM -0400, Taylor Blau wrote:

> > > > -	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
> > > > -		strbuf_reset(&ident);
> > > > -		format_commit_message(commit,
> > > > -				      log->email ? "%aN <%aE>" : "%aN",
> > > > -				      &ident, &ctx);
> > > > -		if (!HAS_MULTI_BITS(log->groups) ||
> > > > -		    strset_add(&dups, ident.buf))
> > > > -			insert_one_record(log, ident.buf, oneline_str);
> > > > -	}
> > >
> > > This loses the HAS_MULTI_BITS() optimization. The idea there is that if
> > > you have a single group-by that can't produce multiple outputs, then
> > > there's no need to do duplicate detection.
> > >
> > > The equivalent in an all-formats world is something like:
> > >
> > >   log.format.nr > 1 && !log.trailers.nr
> > >
> > > (because trailers are special in that one trailer key can produce
> > > multiple idents for a single commit).
> 
> Hmm. Shouldn't that "&& !log.trailers.nr" be an "|| log.trailers.nr"? It
> doesn't seem to make sense to say "there are things that could produce
> multiple outputs" if there's more than one format _and_ no trailers.

Yeah. I was thinking of it as "is it OK to not de-dup", but of course it
is the other way around because of the "!". And regardless of which way
you write the conditional, the two sides must agree. ;)

> The logic should read "there are things that could produce multiple
> outputs if there is more than one format *or* at least one trailer".
> 
> So I think the right change would be:
> 
> --- >8 ---
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index 95ceab7649..7e1b56e2aa 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -216,7 +216,8 @@ static void insert_records_from_format(struct shortlog *log,
> 
>  		format_commit_message(commit, item->string, &buf, ctx);
> 
> -		if (strset_add(dups, buf.buf))
> +		if (!(log->format.nr > 1 || log->trailers.nr) ||
> +		    strset_add(dups, buf.buf))
>  			insert_one_record(log, buf.buf, oneline);
>  	}
> --- 8< ---
> 
> Yeah?

Right. I wondered if it might be a little clearer to drop the outer "!",
which yields:

  if ((log->format.nr <= 1 && !log->trailers.nr) ||
      strset_add(dups, buf.buf))

but it is not really any less confusing. If we gave that first part of
the conditional a name, like:

  if (!needs_dedup(log) || strset_add(dups, buf.buf))

maybe that is better. I dunno.

Regardless, what you wrote above is correct. Thanks for catching it.

-Peff
Taylor Blau Oct. 21, 2022, 10:05 p.m. UTC | #9
On Fri, Oct 21, 2022 at 01:03:40AM -0400, Jeff King wrote:
> Right. I wondered if it might be a little clearer to drop the outer "!",
> which yields:
>
>   if ((log->format.nr <= 1 && !log->trailers.nr) ||
>       strset_add(dups, buf.buf))
>
> but it is not really any less confusing. If we gave that first part of
> the conditional a name, like:
>
>   if (!needs_dedup(log) || strset_add(dups, buf.buf))
>
> maybe that is better. I dunno.

Yeah, I think that this is best. I briefly wondered whether it would be
better to invert the check _and_ name it, but `if
(does_not_need_dedup(log))` is too wordy.

> Regardless, what you wrote above is correct. Thanks for catching it.

No problem, it was an honest mistake.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/builtin/log.c b/builtin/log.c
index ee19dc5d45..6b77e520b5 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1334,6 +1334,7 @@  static void make_cover_letter(struct rev_info *rev, int use_separate_file,
 	log.in2 = 4;
 	log.file = rev->diffopt.file;
 	log.groups = SHORTLOG_GROUP_AUTHOR;
+	shortlog_init_group(&log);
 	for (i = 0; i < nr; i++)
 		shortlog_add_commit(&log, list[i]);
 
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index f708d96558..aac8c7afa4 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -245,15 +245,6 @@  void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
-		strbuf_reset(&ident);
-		format_commit_message(commit,
-				      log->email ? "%aN <%aE>" : "%aN",
-				      &ident, &ctx);
-		if (!HAS_MULTI_BITS(log->groups) ||
-		    strset_add(&dups, ident.buf))
-			insert_one_record(log, ident.buf, oneline_str);
-	}
 	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
 		strbuf_reset(&ident);
 		format_commit_message(commit,
@@ -372,6 +363,16 @@  void shortlog_init(struct shortlog *log)
 	log->format.strdup_strings = 1;
 }
 
+void shortlog_init_group(struct shortlog *log)
+{
+	if (!log->groups)
+		log->groups = SHORTLOG_GROUP_AUTHOR;
+
+	if (log->groups & SHORTLOG_GROUP_AUTHOR)
+		string_list_append(&log->format,
+				   log->email ? "%aN <%aE>" : "%aN");
+}
+
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
 {
 	struct shortlog log = { STRING_LIST_INIT_NODUP };
@@ -439,8 +440,8 @@  int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	log.file = rev.diffopt.file;
 	log.date_mode = rev.date_mode;
 
-	if (!log.groups)
-		log.groups = SHORTLOG_GROUP_AUTHOR;
+	shortlog_init_group(&log);
+
 	string_list_sort(&log.trailers);
 
 	/* assume HEAD if from a tty */
diff --git a/shortlog.h b/shortlog.h
index 4850a8c30f..e52f001fb7 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -33,6 +33,7 @@  struct shortlog {
 };
 
 void shortlog_init(struct shortlog *log);
+void shortlog_init_group(struct shortlog *log);
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit);