diff mbox series

[4/7] shortlog: support arbitrary commit format `--group`s

Message ID 6f38990cc2ea8460ce37437e0770784d9b712dab.1665448437.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series shortlog: introduce `--group=<format>` | expand

Commit Message

Taylor Blau Oct. 11, 2022, 12:34 a.m. UTC
In addition to generating a shortlog based on committer, author, or the
identity in one or more specified trailers, it can be useful to generate
a shortlog based on an arbitrary commit format.

This can be used, for example, to generate a distribution of commit
activity over time, like so:

    $ git shortlog --group='%cd' --date='format:%Y-%m' -s v2.37.0..
       117  2022-06
       274  2022-07
       324  2022-08
       263  2022-09
         7  2022-10

Arbitrary commit formats can be used. In fact, `git shortlog`'s default
behavior (to count by commit authors) can be emulated as follows:

    $ git shortlog --group='%aN <%aE>' ...

and future patches will make the default behavior (as well as
`--committer`, and `--group=trailer:<trailer>`) special cases of the
more flexible `--group` option.

Note also that the SHORTLOG_GROUP_FORMAT enum value is used only to
designate that `--group:<format>` is in use when in stdin mode to
declare that the combination is invalid.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-shortlog.txt |  2 ++
 builtin/shortlog.c             | 33 ++++++++++++++++++++++++++++++++-
 shortlog.h                     |  2 ++
 t/t4201-shortlog.sh            |  9 +++++++++
 4 files changed, 45 insertions(+), 1 deletion(-)

Comments

Jeff King Oct. 11, 2022, 1:12 a.m. UTC | #1
On Mon, Oct 10, 2022 at 08:34:12PM -0400, Taylor Blau wrote:

> diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> index 4982ceee21..ca15643f45 100644
> --- a/Documentation/git-shortlog.txt
> +++ b/Documentation/git-shortlog.txt
> @@ -59,6 +59,8 @@ OPTIONS
>     example, if your project uses `Reviewed-by` trailers, you might want
>     to see who has been reviewing with
>     `git shortlog -ns --group=trailer:reviewed-by`.
> + - `<format>`, any string accepted by the `--format` option of 'git log'.
> +   (See the "PRETTY FORMATS" section of linkgit:git-log[1].)

I have a slight preference here to call this:

  --group=format:<format>

and then DWIM:

  --group=<format>

as long as <format> contains a percent. This is similar to how --pretty
works, but leaves us more room for changing things later if we need to.

> +static void insert_records_from_format(struct shortlog *log,
> +				       struct strset *dups,
> +				       struct commit *commit,
> +				       struct pretty_print_context *ctx,
> +				       const char *oneline)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +	struct string_list_item *item;
> +
> +	for_each_string_list_item(item, &log->format) {
> +		strbuf_reset(&buf);
> +
> +		format_commit_message(commit, item->string, &buf, ctx);
> +
> +		if (strset_add(dups, buf.buf))
> +			insert_one_record(log, buf.buf, oneline);
> +	}
> +
> +	strbuf_release(&buf);
> +}

Versus the earlier patch we discussed, this is one identity per --group,
so it matches how --group=trailer works. Good.

> @@ -243,6 +266,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
>  	if (log->groups & SHORTLOG_GROUP_TRAILER) {
>  		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
>  	}
> +	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);

Hmm. I see why we don't need to guard this with:

  if (log->groups & SHORTLOG_GROUP_FORMAT)

since our helper function is a noop if log->format is empty. But I
wonder if:

  - it's more clear to match the others (although it looks like they are
    going away later, so that potentially becomes a non-issue)

  - it's useful to have a conditional that lets us skip any setup work.
    For trailers, in particular, we call logmsg_reencode(), which is
    potentially expensive. OTOH, it would be easy for the helper
    function to just return early when log->format.nr is 0.

So I'll reserve judgement until seeing the end state.

> diff --git a/shortlog.h b/shortlog.h
> index dc388dd459..4850a8c30f 100644
> --- a/shortlog.h
> +++ b/shortlog.h
> @@ -22,8 +22,10 @@ struct shortlog {
>  		SHORTLOG_GROUP_AUTHOR = (1 << 0),
>  		SHORTLOG_GROUP_COMMITTER = (1 << 1),
>  		SHORTLOG_GROUP_TRAILER = (1 << 2),
> +		SHORTLOG_GROUP_FORMAT = (1 << 3),
>  	} groups;
>  	struct string_list trailers;
> +	struct string_list format;

Given that "shortlog --format" does a totally different thing, I wonder
if this needs a more descriptive name like group_format or something.
Probably not. That other format feature is all in rev_info, and it's not
like "trailers" above doesn't have a similar issue.

> diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> index 3095b1b2ff..2417ac8896 100755
> --- a/t/t4201-shortlog.sh
> +++ b/t/t4201-shortlog.sh
> @@ -237,6 +237,15 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'shortlog --group=<format>' '
> +	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
> +	cat >expect <<-\EOF &&
> +	     4	C O Mitter (2005)
> +	     1	Sin Nombre (2005)
> +	EOF
> +	test_cmp expect actual
> +'

Makes sense. Two other tests that might be worth including:

  - "shortlog --group=bogus" generates an error (we might already have
    such a test; I didn't check)

  - since the multiple-option behavior is so subtle, maybe show a case
    where two formats partially overlap. A plausible one is "--group=%aN
    --group=%cN", but the test setup might need tweaked to cover both.
    There's an existing "multiple groups" test that might come in handy.

-Peff
Taylor Blau Oct. 11, 2022, 1:24 a.m. UTC | #2
On Mon, Oct 10, 2022 at 09:12:46PM -0400, Jeff King wrote:
> On Mon, Oct 10, 2022 at 08:34:12PM -0400, Taylor Blau wrote:
>
> > diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> > index 4982ceee21..ca15643f45 100644
> > --- a/Documentation/git-shortlog.txt
> > +++ b/Documentation/git-shortlog.txt
> > @@ -59,6 +59,8 @@ OPTIONS
> >     example, if your project uses `Reviewed-by` trailers, you might want
> >     to see who has been reviewing with
> >     `git shortlog -ns --group=trailer:reviewed-by`.
> > + - `<format>`, any string accepted by the `--format` option of 'git log'.
> > +   (See the "PRETTY FORMATS" section of linkgit:git-log[1].)
>
> I have a slight preference here to call this:
>
>   --group=format:<format>

That seems very reasonable, thanks!

> > @@ -243,6 +266,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
> >  	if (log->groups & SHORTLOG_GROUP_TRAILER) {
> >  		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
> >  	}
> > +	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
>
> Hmm. I see why we don't need to guard this with:
>
>   if (log->groups & SHORTLOG_GROUP_FORMAT)
>
> since our helper function is a noop if log->format is empty. But I
> wonder if:
>
>   - it's more clear to match the others (although it looks like they are
>     going away later, so that potentially becomes a non-issue)
>
>   - it's useful to have a conditional that lets us skip any setup work.
>     For trailers, in particular, we call logmsg_reencode(), which is
>     potentially expensive. OTOH, it would be easy for the helper
>     function to just return early when log->format.nr is 0.

In this case, `insert_records_from_format()` is cheap when
log->format.nr is 0. It is limited to setting up a strbuf to
STRBUF_INIT, and then calling strbuf_release() on it before returning.

And, indeed, the remaining conditionals go away by the final patch, so
you may want to decide then if it looks good to you or not.

> > diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> > index 3095b1b2ff..2417ac8896 100755
> > --- a/t/t4201-shortlog.sh
> > +++ b/t/t4201-shortlog.sh
> > @@ -237,6 +237,15 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
> >  	test_cmp expect actual
> >  '
> >
> > +test_expect_success 'shortlog --group=<format>' '
> > +	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
> > +	cat >expect <<-\EOF &&
> > +	     4	C O Mitter (2005)
> > +	     1	Sin Nombre (2005)
> > +	EOF
> > +	test_cmp expect actual
> > +'
>
> Makes sense. Two other tests that might be worth including:
>
>   - "shortlog --group=bogus" generates an error (we might already have
>     such a test; I didn't check)

We didn't have such a test before, so adding one is a good call, thanks!

>   - since the multiple-option behavior is so subtle, maybe show a case
>     where two formats partially overlap. A plausible one is "--group=%aN
>     --group=%cN", but the test setup might need tweaked to cover both.
>     There's an existing "multiple groups" test that might come in handy.

Interesting. I was starting to write that test up, but then realized
that this will be covered by the end of the series, since the
`--group=trailer` machinery is reimplemented in terms of the new format
group.

So if the existing "shortlog can match multiple groups" test keeps
passing, we did our job correctly ;-).

Thanks,
Taylor
Taylor Blau Oct. 11, 2022, 1:28 a.m. UTC | #3
On Mon, Oct 10, 2022 at 09:24:32PM -0400, Taylor Blau wrote:
> > Makes sense. Two other tests that might be worth including:
> >
> >   - "shortlog --group=bogus" generates an error (we might already have
> >     such a test; I didn't check)
>
> We didn't have such a test before, so adding one is a good call, thanks!

Just writing back to say that this was a really good suggestion, since
it caught my mistake in writing:

    } else if (strchrnul(arg, '%')) {

since we'll always get back a non-NULL answer from calling `strchrnul`
instead of `strchr`.

Thanks ;-).

Thanks,
Taylor
Jeff King Oct. 11, 2022, 2:02 a.m. UTC | #4
On Mon, Oct 10, 2022 at 09:24:32PM -0400, Taylor Blau wrote:

> > Hmm. I see why we don't need to guard this with:
> >
> >   if (log->groups & SHORTLOG_GROUP_FORMAT)
> >
> > since our helper function is a noop if log->format is empty. But I
> > wonder if:
> >
> >   - it's more clear to match the others (although it looks like they are
> >     going away later, so that potentially becomes a non-issue)
> >
> >   - it's useful to have a conditional that lets us skip any setup work.
> >     For trailers, in particular, we call logmsg_reencode(), which is
> >     potentially expensive. OTOH, it would be easy for the helper
> >     function to just return early when log->format.nr is 0.
> 
> In this case, `insert_records_from_format()` is cheap when
> log->format.nr is 0. It is limited to setting up a strbuf to
> STRBUF_INIT, and then calling strbuf_release() on it before returning.
> 
> And, indeed, the remaining conditionals go away by the final patch, so
> you may want to decide then if it looks good to you or not.

Right. Having read to the end, a few new thoughts:

  - I'm skeptical on the conversion of --group=trailer to use the
    formats. It seems more complicated. The others seem fine.

    As a result, I do think it's awkward to check bits for log.format
    (since now there are many), and we should just enter the helper
    function unconditionally. I do think it's a little weird to check
    the TRAILER bit just before it though (assuming we'd leave that in
    place). But it would be natural for us to just return early and skip
    any setup work in insert_records_from_trailers(). I.e., something
    like this:

    diff --git a/builtin/shortlog.c b/builtin/shortlog.c
    index 086dfee45a..19d953c26a 100644
    --- a/builtin/shortlog.c
    +++ b/builtin/shortlog.c
    @@ -170,6 +170,9 @@ static void insert_records_from_trailers(struct shortlog *log,
     	const char *commit_buffer, *body;
     	struct strbuf ident = STRBUF_INIT;
     
    +	if (!log->trailers.nr)
    +		return;
    +
     	/*
     	 * Using format_commit_message("%B") would be simpler here, but
     	 * this saves us copying the message.
    @@ -240,9 +243,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
     		    strset_add(&dups, ident.buf))
     			insert_one_record(log, ident.buf, oneline_str);
     	}
    -	if (log->groups & SHORTLOG_GROUP_TRAILER) {
    -		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
    -	}
    +	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
     
     	strset_clear(&dups);
     	strbuf_release(&ident);

  - I mentioned possible setup work. And indeed, I think it may be
    useful for insert_records_from_format() to load and cache the commit
    message once via get_commit_buffer(), since that result could then
    be used by all formats. But that is not really specific to
    --group=format! The existing code would benefit for any multi-group
    invocation. So any such setup should probably be at the top of
    shortlog_add_commit() anyway.

> >   - since the multiple-option behavior is so subtle, maybe show a case
> >     where two formats partially overlap. A plausible one is "--group=%aN
> >     --group=%cN", but the test setup might need tweaked to cover both.
> >     There's an existing "multiple groups" test that might come in handy.
> 
> Interesting. I was starting to write that test up, but then realized
> that this will be covered by the end of the series, since the
> `--group=trailer` machinery is reimplemented in terms of the new format
> group.

True, if we follow through on that. ;)

-Peff
Jeff King Oct. 11, 2022, 2:03 a.m. UTC | #5
On Mon, Oct 10, 2022 at 09:28:24PM -0400, Taylor Blau wrote:

> On Mon, Oct 10, 2022 at 09:24:32PM -0400, Taylor Blau wrote:
> > > Makes sense. Two other tests that might be worth including:
> > >
> > >   - "shortlog --group=bogus" generates an error (we might already have
> > >     such a test; I didn't check)
> >
> > We didn't have such a test before, so adding one is a good call, thanks!
> 
> Just writing back to say that this was a really good suggestion, since
> it caught my mistake in writing:
> 
>     } else if (strchrnul(arg, '%')) {
> 
> since we'll always get back a non-NULL answer from calling `strchrnul`
> instead of `strchr`.

I'm going to pretend that I noticed that bug and was gently leading you
to it, rather than that I was oblivious and lucky.

-Peff
Taylor Blau Oct. 21, 2022, 2:39 a.m. UTC | #6
On Mon, Oct 10, 2022 at 10:02:13PM -0400, Jeff King wrote:
> > >   - since the multiple-option behavior is so subtle, maybe show a case
> > >     where two formats partially overlap. A plausible one is "--group=%aN
> > >     --group=%cN", but the test setup might need tweaked to cover both.
> > >     There's an existing "multiple groups" test that might come in handy.
> >
> > Interesting. I was starting to write that test up, but then realized
> > that this will be covered by the end of the series, since the
> > `--group=trailer` machinery is reimplemented in terms of the new format
> > group.
>
> True, if we follow through on that. ;)

So, obviously we ended up not following through on that ;-).

I tried to leverage the existing test as much as possible, to the point
that the new test is pretty hacky. But I think that the result is cute,
and it does save us from having to modify the downstream tests (or
create unreachable commits, initialize another repository, etc).

It looks something like this:

--- 8< ---
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 0abe5354fc..566d581e1b 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -356,6 +356,19 @@ test_expect_success 'shortlog can match multiple groups' '
 	test_cmp expect actual
 '

+test_expect_success 'shortlog can match multiple format groups' '
+	cat >expect <<-\EOF &&
+	     2	User B <b@example.com>
+	     1	A U Thor <author@example.com>
+	     1	User A <a@example.com>
+	EOF
+	git shortlog -ns \
+		--group="%(trailers:valueonly,separator=%0x00,key=some-trailer)" \
+		--group="%(trailers:valueonly,separator=%0x00,key=another-trailer)" \
+		-2 HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'set up option selection tests' '
 	git commit --allow-empty -F - <<-\EOF
 	subject
--- >8 ---

The gross bit there really is the 'separator=%0x00' thing, since the
"trailers" pretty format gives us a NL character already. I suppose that
you could do something like this on top instead:

--- >8 ---
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 566d581e1b..13ac0bac64 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -363,9 +363,10 @@ test_expect_success 'shortlog can match multiple format groups' '
 	     1	User A <a@example.com>
 	EOF
 	git shortlog -ns \
-		--group="%(trailers:valueonly,separator=%0x00,key=some-trailer)" \
-		--group="%(trailers:valueonly,separator=%0x00,key=another-trailer)" \
-		-2 HEAD >actual &&
+		--group="%(trailers:valueonly,key=some-trailer)" \
+		--group="%(trailers:valueonly,key=another-trailer)" \
+		-2 HEAD >actual.raw &&
+	grep -v "^$" actual.raw >actual &&
 	test_cmp expect actual
 '
--- 8< ---

which I think I prefer slightly.

If this is all too hacky for your (or anybody's!) taste, let me know.
But I think ultimately this results in a smaller, more easily digestible
diff overall.

Thanks,
Taylor
Jeff King Oct. 21, 2022, 5:21 a.m. UTC | #7
On Thu, Oct 20, 2022 at 10:39:54PM -0400, Taylor Blau wrote:

> --- 8< ---
> diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> index 0abe5354fc..566d581e1b 100755
> --- a/t/t4201-shortlog.sh
> +++ b/t/t4201-shortlog.sh
> @@ -356,6 +356,19 @@ test_expect_success 'shortlog can match multiple groups' '
>  	test_cmp expect actual
>  '
> 
> +test_expect_success 'shortlog can match multiple format groups' '
> +	cat >expect <<-\EOF &&
> +	     2	User B <b@example.com>
> +	     1	A U Thor <author@example.com>
> +	     1	User A <a@example.com>
> +	EOF
> +	git shortlog -ns \
> +		--group="%(trailers:valueonly,separator=%0x00,key=some-trailer)" \
> +		--group="%(trailers:valueonly,separator=%0x00,key=another-trailer)" \
> +		-2 HEAD >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'set up option selection tests' '
>  	git commit --allow-empty -F - <<-\EOF
>  	subject
> --- >8 ---
> 
> The gross bit there really is the 'separator=%0x00' thing, since the
> "trailers" pretty format gives us a NL character already. I suppose that
> you could do something like this on top instead:

IMHO the new test should avoid using trailers entirely, to avoid the
notion that they are necessary to create the duplicate situation. In a
normal repository, the most obvious one is just asking about both
authors and committers:

  git shortlog --group=format:%cn --group=format:%an

Most commits will have the same value for both, but we want to credit
the commit to them only once.

Of course in our fake test-lib world, authors and committers are
different by default. :-/

But it's easy enough to make more normal-looking commits, perhaps like:

diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 0abe5354fc..f0ff8a1714 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -356,6 +356,20 @@ test_expect_success 'shortlog can match multiple groups' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog can match multiple format groups' '
+	GIT_COMMITTER_NAME=$GIT_AUTHOR_NAME \
+		git commit --allow-empty -m "identical names" &&
+	cat >expect <<-\EOF &&
+	     2	A U Thor
+	     1	C O Mitter
+	EOF
+	git shortlog -ns \
+		--group=%cn \
+		--group=%an \
+		-2 HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'set up option selection tests' '
 	git commit --allow-empty -F - <<-\EOF
 	subject

You could also get fancier by dropping "-s" and making sure that the
commits were attributed as expected, but I think the count covers
things.

-Peff
Taylor Blau Oct. 21, 2022, 10:12 p.m. UTC | #8
On Fri, Oct 21, 2022 at 01:21:30AM -0400, Jeff King wrote:
> On Thu, Oct 20, 2022 at 10:39:54PM -0400, Taylor Blau wrote:
>
> > --- 8< ---
> > diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> > index 0abe5354fc..566d581e1b 100755
> > --- a/t/t4201-shortlog.sh
> > +++ b/t/t4201-shortlog.sh
> > @@ -356,6 +356,19 @@ test_expect_success 'shortlog can match multiple groups' '
> >  	test_cmp expect actual
> >  '
> >
> > +test_expect_success 'shortlog can match multiple format groups' '
> > +	cat >expect <<-\EOF &&
> > +	     2	User B <b@example.com>
> > +	     1	A U Thor <author@example.com>
> > +	     1	User A <a@example.com>
> > +	EOF
> > +	git shortlog -ns \
> > +		--group="%(trailers:valueonly,separator=%0x00,key=some-trailer)" \
> > +		--group="%(trailers:valueonly,separator=%0x00,key=another-trailer)" \
> > +		-2 HEAD >actual &&
> > +	test_cmp expect actual
> > +'
> > +
> >  test_expect_success 'set up option selection tests' '
> >  	git commit --allow-empty -F - <<-\EOF
> >  	subject
> > --- >8 ---
> >
> > The gross bit there really is the 'separator=%0x00' thing, since the
> > "trailers" pretty format gives us a NL character already. I suppose that
> > you could do something like this on top instead:
>
> IMHO the new test should avoid using trailers entirely, to avoid the
> notion that they are necessary to create the duplicate situation. In a
> normal repository, the most obvious one is just asking about both
> authors and committers:
>
>   git shortlog --group=format:%cn --group=format:%an

Yeah, that's fair. I was worried enough about whether or not this was
going to cause significant test fallout, but it ended up being extremely
straightforward and simplified the tests nicely. Thanks!

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 4982ceee21..ca15643f45 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -59,6 +59,8 @@  OPTIONS
    example, if your project uses `Reviewed-by` trailers, you might want
    to see who has been reviewing with
    `git shortlog -ns --group=trailer:reviewed-by`.
+ - `<format>`, any string accepted by the `--format` option of 'git log'.
+   (See the "PRETTY FORMATS" section of linkgit:git-log[1].)
 +
 Note that commits that do not include the trailer will not be counted.
 Likewise, commits with multiple trailers (e.g., multiple signoffs) may
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 051c97bd5a..f708d96558 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -133,6 +133,8 @@  static void read_from_stdin(struct shortlog *log)
 		break;
 	case SHORTLOG_GROUP_TRAILER:
 		die(_("using %s with stdin is not supported"), "--group=trailer");
+	case SHORTLOG_GROUP_FORMAT:
+		die(_("using %s with stdin is not supported"), "--group=format");
 	default:
 		BUG("unhandled shortlog group");
 	}
@@ -200,6 +202,27 @@  static void insert_records_from_trailers(struct shortlog *log,
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
+static void insert_records_from_format(struct shortlog *log,
+				       struct strset *dups,
+				       struct commit *commit,
+				       struct pretty_print_context *ctx,
+				       const char *oneline)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list_item *item;
+
+	for_each_string_list_item(item, &log->format) {
+		strbuf_reset(&buf);
+
+		format_commit_message(commit, item->string, &buf, ctx);
+
+		if (strset_add(dups, buf.buf))
+			insert_one_record(log, buf.buf, oneline);
+	}
+
+	strbuf_release(&buf);
+}
+
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
 	struct strbuf ident = STRBUF_INIT;
@@ -243,6 +266,7 @@  void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	if (log->groups & SHORTLOG_GROUP_TRAILER) {
 		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 	}
+	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
 	strbuf_release(&ident);
@@ -314,6 +338,7 @@  static int parse_group_option(const struct option *opt, const char *arg, int uns
 	if (unset) {
 		log->groups = 0;
 		string_list_clear(&log->trailers, 0);
+		string_list_clear(&log->format, 0);
 	} else if (!strcasecmp(arg, "author"))
 		log->groups |= SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
@@ -321,8 +346,12 @@  static int parse_group_option(const struct option *opt, const char *arg, int uns
 	else if (skip_prefix(arg, "trailer:", &field)) {
 		log->groups |= SHORTLOG_GROUP_TRAILER;
 		string_list_append(&log->trailers, field);
-	} else
+	} else if (strchrnul(arg, '%')) {
+		log->groups |= SHORTLOG_GROUP_FORMAT;
+		string_list_append(&log->format, arg);
+	} else {
 		return error(_("unknown group type: %s"), arg);
+	}
 
 	return 0;
 }
@@ -340,6 +369,7 @@  void shortlog_init(struct shortlog *log)
 	log->in2 = DEFAULT_INDENT2;
 	log->trailers.strdup_strings = 1;
 	log->trailers.cmp = strcasecmp;
+	log->format.strdup_strings = 1;
 }
 
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
@@ -480,4 +510,5 @@  void shortlog_output(struct shortlog *log)
 	log->list.strdup_strings = 1;
 	string_list_clear(&log->list, 1);
 	clear_mailmap(&log->mailmap);
+	string_list_clear(&log->format, 0);
 }
diff --git a/shortlog.h b/shortlog.h
index dc388dd459..4850a8c30f 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -22,8 +22,10 @@  struct shortlog {
 		SHORTLOG_GROUP_AUTHOR = (1 << 0),
 		SHORTLOG_GROUP_COMMITTER = (1 << 1),
 		SHORTLOG_GROUP_TRAILER = (1 << 2),
+		SHORTLOG_GROUP_FORMAT = (1 << 3),
 	} groups;
 	struct string_list trailers;
+	struct string_list format;
 
 	int email;
 	struct string_list mailmap;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 3095b1b2ff..2417ac8896 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -237,6 +237,15 @@  test_expect_success 'shortlog --group=trailer:signed-off-by' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog --group=<format>' '
+	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
+	cat >expect <<-\EOF &&
+	     4	C O Mitter (2005)
+	     1	Sin Nombre (2005)
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'trailer idents are split' '
 	cat >expect <<-\EOF &&
 	     2	C O Mitter