diff mbox series

[v2,4/6] shortlog: support arbitrary commit format `--group`s

Message ID 4a36c0ca4e840d53e2fd257e2d97498ced6fb28c.1666320509.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series shortlog: introduce `--group=<format>` | expand

Commit Message

Taylor Blau Oct. 21, 2022, 3:11 a.m. UTC
In addition to generating a shortlog based on committer, author, or the
identity in one or more specified trailers, it can be useful to generate
a shortlog based on an arbitrary commit format.

This can be used, for example, to generate a distribution of commit
activity over time, like so:

    $ git shortlog --group='%cd' --date='format:%Y-%m' -s v2.37.0..
       117  2022-06
       274  2022-07
       324  2022-08
       263  2022-09
         7  2022-10

Arbitrary commit formats can be used. In fact, `git shortlog`'s default
behavior (to count by commit authors) can be emulated as follows:

    $ git shortlog --group='%aN <%aE>' ...

and future patches will make the default behavior (as well as
`--committer`, and `--group=trailer:<trailer>`) special cases of the
more flexible `--group` option.

Note also that the SHORTLOG_GROUP_FORMAT enum value is used only to
designate that `--group:<format>` is in use when in stdin mode to
declare that the combination is invalid.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-shortlog.txt |  3 +++
 builtin/shortlog.c             | 37 +++++++++++++++++++++++++++-
 shortlog.h                     |  2 ++
 t/t4201-shortlog.sh            | 44 ++++++++++++++++++++++++++++++++++
 4 files changed, 85 insertions(+), 1 deletion(-)

Comments

Jeff King Oct. 21, 2022, 5:48 a.m. UTC | #1
On Thu, Oct 20, 2022 at 11:11:38PM -0400, Taylor Blau wrote:

> +static void insert_records_from_format(struct shortlog *log,
> +				       struct strset *dups,
> +				       struct commit *commit,
> +				       struct pretty_print_context *ctx,
> +				       const char *oneline)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +	struct string_list_item *item;
> +
> +	for_each_string_list_item(item, &log->format) {
> +		strbuf_reset(&buf);
> +
> +		format_commit_message(commit, item->string, &buf, ctx);
> +
> +		if (!(log->format.nr > 1 || log->trailers.nr) ||
> +		    strset_add(dups, buf.buf))
> +			insert_one_record(log, buf.buf, oneline);
> +	}

Just to be clear, since I talked about this conditional in the other
thread: what you have here is correct, and now seeing it again in
context, I think it's OK in terms of readability.

> diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> index 7547da539d..13ac0bac64 100755
> --- a/t/t4201-shortlog.sh
> +++ b/t/t4201-shortlog.sh
> @@ -244,6 +244,36 @@ test_expect_success 'shortlog --group=trailer:signed-off-by' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'shortlog --group=format' '
> +	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
> +		HEAD >actual &&
> +	cat >expect <<-\EOF &&
> +	     4	C O Mitter (2005)
> +	     1	Sin Nombre (2005)
> +	EOF
> +	test_cmp expect actual
> +'

OK, so this is a basic test of the feature, and makes sure the --date
support works.

> +test_expect_success 'shortlog --group=<format> DWIM' '
> +	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
> +	test_cmp expect actual
> +'

And this is the same thing but without "format:". Good...

> +test_expect_success 'shortlog multiple --group=format' '
> +	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
> +		HEAD >actual &&
> +	cat >expect <<-\EOF &&
> +	     4	C O Mitter (2005)
> +	     1	Sin Nombre (2005)
> +	EOF
> +	test_cmp expect actual
> +'

...but this one seems redundant. It is not using multiple formats. And
you test that later, in the "can match multiple format groups" test. Is
this one just leftover from development?

> +test_expect_success 'shortlog bogus --group' '
> +	test_must_fail git shortlog --group=bogus HEAD 2>err &&
> +	grep "unknown group type" err
> +'

This one is a nice inclusion. I was surprised we didn't have it already. :)

> +test_expect_success 'shortlog can match multiple format groups' '
> +	cat >expect <<-\EOF &&
> +	     2	User B <b@example.com>
> +	     1	A U Thor <author@example.com>
> +	     1	User A <a@example.com>
> +	EOF
> +	git shortlog -ns \
> +		--group="%(trailers:valueonly,key=some-trailer)" \
> +		--group="%(trailers:valueonly,key=another-trailer)" \
> +		-2 HEAD >actual.raw &&
> +	grep -v "^$" actual.raw >actual &&
> +	test_cmp expect actual
> +'

And this one is correct, though I think avoiding trailers is more to the
point; see the other mail I sent.

-Peff
Taylor Blau Oct. 21, 2022, 10:07 p.m. UTC | #2
On Fri, Oct 21, 2022 at 01:48:22AM -0400, Jeff King wrote:
> > +test_expect_success 'shortlog multiple --group=format' '
> > +	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
> > +		HEAD >actual &&
> > +	cat >expect <<-\EOF &&
> > +	     4	C O Mitter (2005)
> > +	     1	Sin Nombre (2005)
> > +	EOF
> > +	test_cmp expect actual
> > +'
>
> ...but this one seems redundant. It is not using multiple formats. And
> you test that later, in the "can match multiple format groups" test. Is
> this one just leftover from development?

Oops, yes. Thanks for catching it, I'll drop this one.

> > +test_expect_success 'shortlog bogus --group' '
> > +	test_must_fail git shortlog --group=bogus HEAD 2>err &&
> > +	grep "unknown group type" err
> > +'
>
> This one is a nice inclusion. I was surprised we didn't have it already. :)

:-).

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 311c041c06..8cf5a2e3c7 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -64,6 +64,9 @@  OPTIONS
    example, if your project uses `Reviewed-by` trailers, you might want
    to see who has been reviewing with
    `git shortlog -ns --group=trailer:reviewed-by`.
+ - `format:<format>`, any string accepted by the `--format` option of
+   'git log'. (See the "PRETTY FORMATS" section of
+   linkgit:git-log[1].)
 +
 Note that commits that do not include the trailer will not be counted.
 Likewise, commits with multiple trailers (e.g., multiple signoffs) may
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index d0645769d7..0358397a7b 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -133,6 +133,8 @@  static void read_from_stdin(struct shortlog *log)
 		break;
 	case SHORTLOG_GROUP_TRAILER:
 		die(_("using %s with stdin is not supported"), "--group=trailer");
+	case SHORTLOG_GROUP_FORMAT:
+		die(_("using %s with stdin is not supported"), "--group=format");
 	default:
 		BUG("unhandled shortlog group");
 	}
@@ -203,6 +205,28 @@  static void insert_records_from_trailers(struct shortlog *log,
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
+static void insert_records_from_format(struct shortlog *log,
+				       struct strset *dups,
+				       struct commit *commit,
+				       struct pretty_print_context *ctx,
+				       const char *oneline)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list_item *item;
+
+	for_each_string_list_item(item, &log->format) {
+		strbuf_reset(&buf);
+
+		format_commit_message(commit, item->string, &buf, ctx);
+
+		if (!(log->format.nr > 1 || log->trailers.nr) ||
+		    strset_add(dups, buf.buf))
+			insert_one_record(log, buf.buf, oneline);
+	}
+
+	strbuf_release(&buf);
+}
+
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
 	struct strbuf ident = STRBUF_INIT;
@@ -244,6 +268,7 @@  void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 			insert_one_record(log, ident.buf, oneline_str);
 	}
 	insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
+	insert_records_from_format(log, &dups, commit, &ctx, oneline_str);
 
 	strset_clear(&dups);
 	strbuf_release(&ident);
@@ -315,6 +340,7 @@  static int parse_group_option(const struct option *opt, const char *arg, int uns
 	if (unset) {
 		log->groups = 0;
 		string_list_clear(&log->trailers, 0);
+		string_list_clear(&log->format, 0);
 	} else if (!strcasecmp(arg, "author"))
 		log->groups |= SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
@@ -322,8 +348,15 @@  static int parse_group_option(const struct option *opt, const char *arg, int uns
 	else if (skip_prefix(arg, "trailer:", &field)) {
 		log->groups |= SHORTLOG_GROUP_TRAILER;
 		string_list_append(&log->trailers, field);
-	} else
+	} else if (skip_prefix(arg, "format:", &field)) {
+		log->groups |= SHORTLOG_GROUP_FORMAT;
+		string_list_append(&log->format, field);
+	} else if (strchr(arg, '%')) {
+		log->groups |= SHORTLOG_GROUP_FORMAT;
+		string_list_append(&log->format, arg);
+	} else {
 		return error(_("unknown group type: %s"), arg);
+	}
 
 	return 0;
 }
@@ -341,6 +374,7 @@  void shortlog_init(struct shortlog *log)
 	log->in2 = DEFAULT_INDENT2;
 	log->trailers.strdup_strings = 1;
 	log->trailers.cmp = strcasecmp;
+	log->format.strdup_strings = 1;
 }
 
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
@@ -481,4 +515,5 @@  void shortlog_output(struct shortlog *log)
 	log->list.strdup_strings = 1;
 	string_list_clear(&log->list, 1);
 	clear_mailmap(&log->mailmap);
+	string_list_clear(&log->format, 0);
 }
diff --git a/shortlog.h b/shortlog.h
index dc388dd459..4850a8c30f 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -22,8 +22,10 @@  struct shortlog {
 		SHORTLOG_GROUP_AUTHOR = (1 << 0),
 		SHORTLOG_GROUP_COMMITTER = (1 << 1),
 		SHORTLOG_GROUP_TRAILER = (1 << 2),
+		SHORTLOG_GROUP_FORMAT = (1 << 3),
 	} groups;
 	struct string_list trailers;
+	struct string_list format;
 
 	int email;
 	struct string_list mailmap;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 7547da539d..13ac0bac64 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -244,6 +244,36 @@  test_expect_success 'shortlog --group=trailer:signed-off-by' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog --group=format' '
+	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
+		HEAD >actual &&
+	cat >expect <<-\EOF &&
+	     4	C O Mitter (2005)
+	     1	Sin Nombre (2005)
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog --group=<format> DWIM' '
+	git shortlog -s --date="format:%Y" --group="%cN (%cd)" HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog multiple --group=format' '
+	git shortlog -s --date="format:%Y" --group="format:%cN (%cd)" \
+		HEAD >actual &&
+	cat >expect <<-\EOF &&
+	     4	C O Mitter (2005)
+	     1	Sin Nombre (2005)
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'shortlog bogus --group' '
+	test_must_fail git shortlog --group=bogus HEAD 2>err &&
+	grep "unknown group type" err
+'
+
 test_expect_success 'trailer idents are split' '
 	cat >expect <<-\EOF &&
 	     2	C O Mitter
@@ -326,6 +356,20 @@  test_expect_success 'shortlog can match multiple groups' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog can match multiple format groups' '
+	cat >expect <<-\EOF &&
+	     2	User B <b@example.com>
+	     1	A U Thor <author@example.com>
+	     1	User A <a@example.com>
+	EOF
+	git shortlog -ns \
+		--group="%(trailers:valueonly,key=some-trailer)" \
+		--group="%(trailers:valueonly,key=another-trailer)" \
+		-2 HEAD >actual.raw &&
+	grep -v "^$" actual.raw >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'set up option selection tests' '
 	git commit --allow-empty -F - <<-\EOF
 	subject