diff mbox series

[8/8] shortlog: allow multiple groups to be specified

Message ID 20200925070550.GH62741@coredump.intra.peff.net (mailing list archive)
State Superseded
Headers show
Series parsing trailers with shortlog | expand

Commit Message

Jeff King Sept. 25, 2020, 7:05 a.m. UTC
Now that shortlog supports reading from trailers, it can be useful to
combine counts from multiple trailers, or between trailers and authors.
This can be done manually by post-processing the output from multiple
runs, but it's non-trivial to make sure that each name/commit pair is
counted only once.

This patch teaches shortlog to accept multiple --group options on the
command line, and pull data from all of them. That makes it possible to
run:

  git shortlog -ns --group=author --group=trailer:co-authored-by

to get a shortlog that counts authors and co-authors equally.

The implementation is mostly straightforward. The "group" enum becomes a
bitfield, and the trailer key becomes a list. I didn't bother
implementing the multi-group semantics for reading from stdin. It would
be possible to do, but the existing matching code makes it awkward, and
I doubt anybody cares.

The duplicate suppression we used for trailers now covers authors and
committers as well (though in non-trailer single-group mode we can skip
the hash insertion and lookup, since we only see one value per commit).

There is one subtlety: we now care about the case when no group bit is
set (in which case we default to showing the author). The caller in
builtin/log.c needs to be adapted to ask explicitly for authors, rather
than relying on shortlog_init(). It would be possible with some
gymnastics to make this keep working as-is, but it's not worth it for a
single caller.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-shortlog.txt |  5 +++
 builtin/log.c                  |  1 +
 builtin/shortlog.c             | 69 ++++++++++++++++++++-----------
 shortlog.h                     | 10 ++---
 t/t4201-shortlog.sh            | 74 ++++++++++++++++++++++++++++++++++
 5 files changed, 130 insertions(+), 29 deletions(-)

Comments

Eric Sunshine Sept. 25, 2020, 8:23 p.m. UTC | #1
On Fri, Sep 25, 2020 at 3:05 AM Jeff King <peff@peff.net> wrote:
> This patch teaches shortlog to accept multiple --group options on the
> command line, and pull data from all of them. That makes it possible to
> run:
> [...]
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> @@ -69,6 +69,11 @@ Shortlog will attempt to parse each trailer value as a `name <email>`
>  identity. If successful, the mailmap is applied and the email is omitted
>  unless the `--email` option is specified. If the value cannot be parsed
>  as an identity, it will be taken literally and completely.
> ++
> +If `--group` is specified multiple times, commits are counted under each
> +value (but again, only once per unique value in that commit). For
> +example, `git shortlog --group=author --group=trailer:co-authored-by`
> +counts both authors and co-authors.

Intuitively, I understand (or hope) that the first use of --group
overrides what is otherwise collected by default (authors), however,
would there be value in stating this explicitly?

At this point, the documentation for --committer still says:

    --committer::
        Collect and show committer identities instead of authors.
        This is an alias for `--group=committer`.

which stops feeling accurate now that --group can be specified more
than once. The "instead of authors" bit is particularly jarring. I
wonder if the first sentence can simply be dropped.
Martin Ågren Sept. 26, 2020, 12:48 p.m. UTC | #2
On Fri, 25 Sep 2020 at 09:09, Jeff King <peff@peff.net> wrote:
> +static inline int has_multi_bits(unsigned n)
> +{
> +       return (n & (n - 1)) != 0;
> +}

There's a HAS_MULTI_BITS macro in git-compat-util.h. I like how you came
up with the exact same name. It even makes me wonder if you are actively
avoiding it in favor of a re-implementation, but I can't see why.

>         enum {
> -               SHORTLOG_GROUP_AUTHOR = 0,
> -               SHORTLOG_GROUP_COMMITTER,
> -               SHORTLOG_GROUP_TRAILER
> -       } group;
> +               SHORTLOG_GROUP_AUTHOR = (1 << 0),
> +               SHORTLOG_GROUP_COMMITTER = (1 << 1),
> +               SHORTLOG_GROUP_TRAILER = (1 << 2)
> +       } groups;

Coming back to the comment on 2/8, adding a comma at the end wouldn't
reduce patch noise here and now, but whenever someone touches this place
the next time.

Of my comments in the last few mails, probably the only actionable one
is the truncated sentence in the trailer iterator header file.

Martin
Jeff King Sept. 27, 2020, 8:06 a.m. UTC | #3
On Fri, Sep 25, 2020 at 04:23:39PM -0400, Eric Sunshine wrote:

> > diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
> > @@ -69,6 +69,11 @@ Shortlog will attempt to parse each trailer value as a `name <email>`
> >  identity. If successful, the mailmap is applied and the email is omitted
> >  unless the `--email` option is specified. If the value cannot be parsed
> >  as an identity, it will be taken literally and completely.
> > ++
> > +If `--group` is specified multiple times, commits are counted under each
> > +value (but again, only once per unique value in that commit). For
> > +example, `git shortlog --group=author --group=trailer:co-authored-by`
> > +counts both authors and co-authors.
> 
> Intuitively, I understand (or hope) that the first use of --group
> overrides what is otherwise collected by default (authors), however,
> would there be value in stating this explicitly?

Yes. Maybe my "if no --group is specified" suggestion from the other
part of the thread _is_ worth doing then.

> At this point, the documentation for --committer still says:
> 
>     --committer::
>         Collect and show committer identities instead of authors.
>         This is an alias for `--group=committer`.
> 
> which stops feeling accurate now that --group can be specified more
> than once. The "instead of authors" bit is particularly jarring. I
> wonder if the first sentence can simply be dropped.

Hmm, yeah. I actually kept --committer different at first, having it
override any other --group options. But that seemed confusing, too
(since it's now _almost_ an alias for --group=committer). I think your
suggestion works to just explicitly mark it as an alias and nothing.

-Peff
Jeff King Sept. 27, 2020, 8:25 a.m. UTC | #4
On Sat, Sep 26, 2020 at 02:48:44PM +0200, Martin Ågren wrote:

> On Fri, 25 Sep 2020 at 09:09, Jeff King <peff@peff.net> wrote:
> > +static inline int has_multi_bits(unsigned n)
> > +{
> > +       return (n & (n - 1)) != 0;
> > +}
> 
> There's a HAS_MULTI_BITS macro in git-compat-util.h. I like how you came
> up with the exact same name. It even makes me wonder if you are actively
> avoiding it in favor of a re-implementation, but I can't see why.

Heh. I _thought_ we had such a helper, and I even thought I remembered
the name. But when I grepped for it, I couldn't come up with it. Silly
"-i". (So I guess it's no surprise that I came up with the same
implementation, which is the only good one, and the same name, which I
did pull from the back of my mind).

I'll drop this in favor of the macro.

> >         enum {
> > -               SHORTLOG_GROUP_AUTHOR = 0,
> > -               SHORTLOG_GROUP_COMMITTER,
> > -               SHORTLOG_GROUP_TRAILER
> > -       } group;
> > +               SHORTLOG_GROUP_AUTHOR = (1 << 0),
> > +               SHORTLOG_GROUP_COMMITTER = (1 << 1),
> > +               SHORTLOG_GROUP_TRAILER = (1 << 2)
> > +       } groups;
> 
> Coming back to the comment on 2/8, adding a comma at the end wouldn't
> reduce patch noise here and now, but whenever someone touches this place
> the next time.

Agreed, will do.

> Of my comments in the last few mails, probably the only actionable one
> is the truncated sentence in the trailer iterator header file.

I agree none are big, but worth addressing since I was making a re-roll
anyway. Thanks.

-Peff
diff mbox series

Patch

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 5f8f918cad..f31770dab5 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -69,6 +69,11 @@  Shortlog will attempt to parse each trailer value as a `name <email>`
 identity. If successful, the mailmap is applied and the email is omitted
 unless the `--email` option is specified. If the value cannot be parsed
 as an identity, it will be taken literally and completely.
++
+If `--group` is specified multiple times, commits are counted under each
+value (but again, only once per unique value in that commit). For
+example, `git shortlog --group=author --group=trailer:co-authored-by`
+counts both authors and co-authors.
 
 -c::
 --committer::
diff --git a/builtin/log.c b/builtin/log.c
index b8824d898f..7f27e9eca1 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1195,6 +1195,7 @@  static void make_cover_letter(struct rev_info *rev, int use_stdout,
 	log.in1 = 2;
 	log.in2 = 4;
 	log.file = rev->diffopt.file;
+	log.groups = SHORTLOG_GROUP_AUTHOR;
 	for (i = 0; i < nr; i++)
 		shortlog_add_commit(&log, list[i]);
 
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 28133aec68..a36a8717b7 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -121,6 +121,11 @@  static int parse_ident(struct shortlog *log,
 	return 0;
 }
 
+static inline int has_multi_bits(unsigned n)
+{
+	return (n & (n - 1)) != 0;
+}
+
 static void read_from_stdin(struct shortlog *log)
 {
 	struct strbuf ident = STRBUF_INIT;
@@ -130,7 +135,10 @@  static void read_from_stdin(struct shortlog *log)
 	static const char *committer_match[2] = { "Commit: ", "committer " };
 	const char **match;
 
-	switch (log->group) {
+	if (has_multi_bits(log->groups))
+		die(_("using multiple --group options with stdin is not supported"));
+
+	switch (log->groups) {
 	case SHORTLOG_GROUP_AUTHOR:
 		match = author_match;
 		break;
@@ -221,13 +229,13 @@  static void strset_clear(struct strset *ss)
 }
 
 static void insert_records_from_trailers(struct shortlog *log,
+					 struct strset *dups,
 					 struct commit *commit,
 					 struct pretty_print_context *ctx,
 					 const char *oneline)
 {
 	struct trailer_iterator iter;
 	const char *commit_buffer, *body;
-	struct strset dups = STRSET_INIT;
 	struct strbuf ident = STRBUF_INIT;
 
 	/*
@@ -243,28 +251,28 @@  static void insert_records_from_trailers(struct shortlog *log,
 	while (trailer_iterator_advance(&iter)) {
 		const char *value = iter.val.buf;
 
-		if (strcasecmp(iter.key.buf, log->trailer))
+		if (!string_list_has_string(&log->trailers, iter.key.buf))
 			continue;
 
 		strbuf_reset(&ident);
 		if (!parse_ident(log, &ident, value))
 			value = ident.buf;
 
-		if (strset_check_and_add(&dups, value))
+		if (strset_check_and_add(dups, value))
 			continue;
 		insert_one_record(log, value, oneline);
 	}
 	trailer_iterator_release(&iter);
 
 	strbuf_release(&ident);
-	strset_clear(&dups);
 	unuse_commit_buffer(commit, commit_buffer);
 }
 
 void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 {
 	struct strbuf ident = STRBUF_INIT;
 	struct strbuf oneline = STRBUF_INIT;
+	struct strset dups = STRSET_INIT;
 	struct pretty_print_context ctx = {0};
 	const char *oneline_str;
 
@@ -282,24 +290,29 @@  void shortlog_add_commit(struct shortlog *log, struct commit *commit)
 	}
 	oneline_str = oneline.len ? oneline.buf : "<none>";
 
-	switch (log->group) {
-	case SHORTLOG_GROUP_AUTHOR:
+	if (log->groups & SHORTLOG_GROUP_AUTHOR) {
+		strbuf_reset(&ident);
 		format_commit_message(commit,
 				      log->email ? "%aN <%aE>" : "%aN",
 				      &ident, &ctx);
-		insert_one_record(log, ident.buf, oneline_str);
-		break;
-	case SHORTLOG_GROUP_COMMITTER:
+		if (!has_multi_bits(log->groups) ||
+		    !strset_check_and_add(&dups, ident.buf))
+			insert_one_record(log, ident.buf, oneline_str);
+	}
+	if (log->groups & SHORTLOG_GROUP_COMMITTER) {
+		strbuf_reset(&ident);
 		format_commit_message(commit,
 				      log->email ? "%cN <%cE>" : "%cN",
 				      &ident, &ctx);
-		insert_one_record(log, ident.buf, oneline_str);
-		break;
-	case SHORTLOG_GROUP_TRAILER:
-		insert_records_from_trailers(log, commit, &ctx, oneline_str);
-		break;
+		if (!has_multi_bits(log->groups) ||
+		    !strset_check_and_add(&dups, ident.buf))
+			insert_one_record(log, ident.buf, oneline_str);
+	}
+	if (log->groups & SHORTLOG_GROUP_TRAILER) {
+		insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str);
 	}
 
+	strset_clear(&dups);
 	strbuf_release(&ident);
 	strbuf_release(&oneline);
 }
@@ -366,14 +379,16 @@  static int parse_group_option(const struct option *opt, const char *arg, int uns
 	struct shortlog *log = opt->value;
 	const char *field;
 
-	if (unset || !strcasecmp(arg, "author"))
-		log->group = SHORTLOG_GROUP_AUTHOR;
+	if (unset) {
+		log->groups = 0;
+		string_list_clear(&log->trailers, 0);
+	} else if (!strcasecmp(arg, "author"))
+		log->groups |= SHORTLOG_GROUP_AUTHOR;
 	else if (!strcasecmp(arg, "committer"))
-		log->group = SHORTLOG_GROUP_COMMITTER;
+		log->groups |= SHORTLOG_GROUP_COMMITTER;
 	else if (skip_prefix(arg, "trailer:", &field)) {
-		log->group = SHORTLOG_GROUP_TRAILER;
-		free(log->trailer);
-		log->trailer = xstrdup(field);
+		log->groups |= SHORTLOG_GROUP_TRAILER;
+		string_list_append(&log->trailers, field);
 	} else
 		return error(_("unknown group type: %s"), arg);
 
@@ -391,6 +406,8 @@  void shortlog_init(struct shortlog *log)
 	log->wrap = DEFAULT_WRAPLEN;
 	log->in1 = DEFAULT_INDENT1;
 	log->in2 = DEFAULT_INDENT2;
+	log->trailers.strdup_strings = 1;
+	log->trailers.cmp = strcasecmp;
 }
 
 int cmd_shortlog(int argc, const char **argv, const char *prefix)
@@ -400,9 +417,9 @@  int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	int nongit = !startup_info->have_repository;
 
 	const struct option options[] = {
-		OPT_SET_INT('c', "committer", &log.group,
-			    N_("Group by committer rather than author"),
-			    SHORTLOG_GROUP_COMMITTER),
+		OPT_BIT('c', "committer", &log.groups,
+			N_("Group by committer rather than author"),
+			SHORTLOG_GROUP_COMMITTER),
 		OPT_BOOL('n', "numbered", &log.sort_by_number,
 			 N_("sort output according to the number of commits per author")),
 		OPT_BOOL('s', "summary", &log.summary,
@@ -454,6 +471,10 @@  int cmd_shortlog(int argc, const char **argv, const char *prefix)
 	log.abbrev = rev.abbrev;
 	log.file = rev.diffopt.file;
 
+	if (!log.groups)
+		log.groups = SHORTLOG_GROUP_AUTHOR;
+	string_list_sort(&log.trailers);
+
 	/* assume HEAD if from a tty */
 	if (!nongit && !rev.pending.nr && isatty(0))
 		add_head_to_pending(&rev);
diff --git a/shortlog.h b/shortlog.h
index 54ce55e9e9..030a0b8490 100644
--- a/shortlog.h
+++ b/shortlog.h
@@ -17,11 +17,11 @@  struct shortlog {
 	int abbrev;
 
 	enum {
-		SHORTLOG_GROUP_AUTHOR = 0,
-		SHORTLOG_GROUP_COMMITTER,
-		SHORTLOG_GROUP_TRAILER
-	} group;
-	char *trailer;
+		SHORTLOG_GROUP_AUTHOR = (1 << 0),
+		SHORTLOG_GROUP_COMMITTER = (1 << 1),
+		SHORTLOG_GROUP_TRAILER = (1 << 2)
+	} groups;
+	struct string_list trailers;
 
 	char *common_repo_prefix;
 	int email;
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index a62ee9ed55..3d5c4a2086 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -282,4 +282,78 @@  test_expect_success 'shortlog de-duplicates trailers in a single commit' '
 	test_cmp expect actual
 '
 
+test_expect_success 'shortlog can match multiple groups' '
+	git commit --allow-empty -F - <<-\EOF &&
+	subject one
+
+	this has two trailers that are distinct from the author; it will count
+	3 times in the output
+
+	Some-trailer: User A <a@example.com>
+	Another-trailer: User B <b@example.com>
+	EOF
+
+	git commit --allow-empty -F - <<-\EOF &&
+	subject two
+
+	this one has two trailers, one of which is a duplicate with the author;
+	it will only be counted once for them
+
+	Another-trailer: A U Thor <author@example.com>
+	Some-trailer: User B <b@example.com>
+	EOF
+
+	cat >expect <<-\EOF &&
+	     2	A U Thor
+	     2	User B
+	     1	User A
+	EOF
+	git shortlog -ns \
+		--group=author \
+		--group=trailer:some-trailer \
+		--group=trailer:another-trailer \
+		-2 HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'set up option selection tests' '
+	git commit --allow-empty -F - <<-\EOF
+	subject
+
+	body
+
+	Trailer-one: value-one
+	Trailer-two: value-two
+	EOF
+'
+
+test_expect_success '--no-group resets group list to author' '
+	cat >expect <<-\EOF &&
+	     1	A U Thor
+	EOF
+	git shortlog -ns \
+		--group=committer \
+		--group=trailer:trailer-one \
+		--no-group \
+		-1 HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--no-group resets trailer list' '
+	cat >expect <<-\EOF &&
+	     1	value-two
+	EOF
+	git shortlog -ns \
+		--group=trailer:trailer-one \
+		--no-group \
+		--group=trailer:trailer-two \
+		-1 HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'stdin with multiple groups reports error' '
+	git log >log &&
+	test_must_fail git shortlog --group=author --group=committer <log
+'
+
 test_done