Message ID | 20220922232536.40807-1-jacob@initialcommit.io (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,v2] shortlog: add group-by options for year and month | expand |
Jacob Stopak <jacob@initialcommit.io> writes: > To justify why the new year/month groups should have shorthand flags > while "--group=trailer:value" does not, I'd say that the fact that > trailer requires a custom value would make the shorthand version clunky, > and it wouldn't fit in well with other shorthand options like -n or -s. > Since year/month have no custom value, the flags make a bit more sense > and would match up with how "-c, --committer" currently works. It is an explanation why it is easier to implement and design --year etc., and does not justify why adding --year etc. is a good idea at all. Let's not add the "aliases" in the same patch. > - `author`, commits are grouped by author > - `committer`, commits are grouped by committer (the same as `-c`) > + - `month`, commits are grouped by month (the same as `-m`) > + - `year`, commits are grouped by year (the same as `-y`) It is unclear what timestamp is used, how a "month" is defined, etc. As "git shortlog --since=2.years" cuts off based on the committer timestamp, I would expect that the committer timestamps are used for this grouping as well? If I make a commit on the first day of the month in my timezone, but that instant happens to be still on the last day of the previous month in your timezone, which month would your invocation of "git shortlog --group=month" would the commit be attributed? My month, or your month? Does it make sense to even say "group by month and year"? I expect that it would mean the same thing as "group by month", and if that is the case, the command probably should error out or at least warn if both are given. An alternative interpretation could be, when told to "group by month", group the commits made in September 2022 into the same group as the group for commits made in September in all other years, but I do not know how useful it would be. Not a suggestion to use a different implementation or add a new feature on top of this --group-by-time-range idea, but I wonder if it is a more flexible and generalizeable approach to say "formulate this value given by the --format=<format> string, apply this regular expression match, and group by the subexpression value". E.g. git shortlog \ --group-by-value="%cI" \ --group-by-regexp="^(\d{4}-\d{2})" would "formulate the 'committer date in ISO format' value, and apply the 'grab leading 4 digits followed by a dash followed by 2 digits' regexp, and group by the matched part". That's a better way to implement "group by month" internally, and allow more flexibility. If a project is well disciplined and its commit titles follow the "<area>: <description>" convention, you probably could do git shortlog --no-merges \ --group-by-value="%s" \ --group-by-regexp="^([^:]+):" and group by <area> each commit touches. Of course, existing --committer and --author can also be internally reimplemented using the same mechanism. > @@ -80,6 +82,14 @@ counts both authors and co-authors. > --committer:: > This is an alias for `--group=committer`. > > +-m:: > +--month:: > + This is an alias for `--group=month`. > + > +-y:: > +--year:: > + This is an alias for `--group=year`. > + Let's not add this in the same patch. I am fairly negative on adding more, outside "--group". Besides, we do not have a good answer to those who want to group by week. -w is already taken.
On Fri, Sep 23, 2022 at 09:17:10AM -0700, Junio C Hamano wrote: > It is unclear what timestamp is used, how a "month" is defined, etc. > As "git shortlog --since=2.years" cuts off based on the committer > timestamp, I would expect that the committer timestamps are used for > this grouping as well? It uses the "commit->date" member from the commit struct in commit.h, which I assumed was the committer timestamp, but I'll confirm that's what actually gets populated in there since I don't see a separate member for the author timestamp... > If I make a commit on the first day of the > month in my timezone, but that instant happens to be still on the > last day of the previous month in your timezone, which month would > your invocation of "git shortlog --group=month" would the commit be > attributed? My month, or your month? I need to look into how Git typically handles these timezone differences and will try to apply similar behavior for these time-based groupings. > Does it make sense to even say "group by month and year"? I expect > that it would mean the same thing as "group by month", and if that > is the case, the command probably should error out or at least warn > if both are given. Yes, "group by month and year" and "group by month" means the same thing the way I implemented. If both groups are supplied, it will just ignore the year group and group by month like you mentioned, by flipping off the YEAR bit as follows: if ((log->groups & SHORTLOG_GROUP_MONTH) && (log->groups & SHORTLOG_GROUP_YEAR)) log->groups ^= SHORTLOG_GROUP_YEAR; I can add a warning message to make this more clear to the user. > An alternative interpretation could be, when > told to "group by month", group the commits made in September 2022 > into the same group as the group for commits made in September in > all other years, but I do not know how useful it would be. In data analytics terms this is usually referred to as "month of year", and personally I see it less useful in Git's shortlog context because I envision more folks would find output useful for single or consecutive time periods. However, adding a "month of year" grouping could be useful to answer questions like "what periods throughout the year are contributors most active?". If we decide to add a "month of year" grouping option as well, it would be trivial to include. > Not a suggestion to use a different implementation or add a new > feature on top of this --group-by-time-range idea, but I wonder if > it is a more flexible and generalizeable approach to say "formulate > this value given by the --format=<format> string, apply this regular > expression match, and group by the subexpression value". E.g. > > git shortlog \ > --group-by-value="%cI" \ > --group-by-regexp="^(\d{4}-\d{2})" > > would "formulate the 'committer date in ISO format' value, and apply > the 'grab leading 4 digits followed by a dash followed by 2 digits' > regexp, and group by the matched part". > > That's a better way to implement "group by month" internally, and > allow more flexibility. If a project is well disciplined and its > commit titles follow the "<area>: <description>" convention, you > probably could do > > git shortlog --no-merges \ > --group-by-value="%s" \ > --group-by-regexp="^([^:]+):" > > and group by <area> each commit touches. Of course, existing > --committer and --author can also be internally reimplemented using > the same mechanism. At first look this sounds very flexible and appealing, and I would be interested in exploring a refactor to this in the future. I think the rub is that supplying custom patterns wouldn't necessarily stack up neatly into good groups, which could lead to confusing results for the user in terms of both grouping and sorting. But like you mentioned it could be really cool if used judiciously for a consistent history like Git's. And the generalized re-implementation of the current shortlog groups would be a nice bonus. > > @@ -80,6 +82,14 @@ counts both authors and co-authors. > > --committer:: > > This is an alias for `--group=committer`. > > > > +-m:: > > +--month:: > > + This is an alias for `--group=month`. > > + > > +-y:: > > +--year:: > > + This is an alias for `--group=year`. > > + > > Let's not add this in the same patch. I am fairly negative on > adding more, outside "--group". Besides, we do not have a good > answer to those who want to group by week. -w is already taken. No worries - I'll remove the shorthand flags for v3.
On Fri, Sep 23, 2022 at 09:17:10AM -0700, Junio C Hamano wrote: > Not a suggestion to use a different implementation or add a new > feature on top of this --group-by-time-range idea, but I wonder if > it is a more flexible and generalizeable approach to say "formulate > this value given by the --format=<format> string, apply this regular > expression match, and group by the subexpression value". E.g. > > git shortlog \ > --group-by-value="%cI" \ > --group-by-regexp="^(\d{4}-\d{2})" Heh, I was about to make the exact same suggestion. The existing "--group=author" could really just be "--group='%an <%ae>'" (or variants depending on the "-e" flag). I don't think you even really need the regexp. If we respect --date, then you should be able to ask for --date=format:%Y-%m. Unfortunately there's no way to specify the format as part of the placeholder. The for-each-ref formatter understands this, like: %(authordate:format:%Y-%m) I wouldn't be opposed to teaching the git-log formatter something similar. > That's a better way to implement "group by month" internally, and > allow more flexibility. If a project is well disciplined and its > commit titles follow the "<area>: <description>" convention, you > probably could do > > git shortlog --no-merges \ > --group-by-value="%s" \ > --group-by-regexp="^([^:]+):" > > and group by <area> each commit touches. Of course, existing > --committer and --author can also be internally reimplemented using > the same mechanism. This example makes the regex feature more interesting, because it's not something we'd likely have a unique placeholder for (or maybe we should?). But there's something else interesting going on in Jack's patch, which is that he's not just introducing the date-sorting, but also that it's used in conjunction with other sorting. So really the intended use is something like: git shortlog --group:author --group:%Y-%m I think we'd want to allow the general form to be a series of groupings. In the output from his patch it looks like: 2022-09 Jeff King some commit message another commit message I.e., the groups are collapsed into a single string, and unique strings become their own groups (and are sorted in the usual way). If you give up the regex thing, then that naturally falls out as (imagining we learn about authordate as a placeholder): git shortlog --group='%(authordate:format=%Y-%n) %an' without having to implement multiple groupings as a specific feature (which is both more code, but also has user-facing confusion about when --group overrides versus appends). That also skips the question of which --group-by-regex applies to which --group-by-value. I do agree the regex thing is more flexible, but if we can't come up with a case more compelling than subsystem matching, I'd just as soon add %(subject:subsystem) or similar. :) -Peff
Jeff King <peff@peff.net> writes: > If you give up the regex thing, then that naturally falls out as > (imagining we learn about authordate as a placeholder): > > git shortlog --group='%(authordate:format=%Y-%n) %an' > > without having to implement multiple groupings as a specific feature > (which is both more code, but also has user-facing confusion about when > --group overrides versus appends). That also skips the question of which > --group-by-regex applies to which --group-by-value. > > I do agree the regex thing is more flexible, but if we can't come up > with a case more compelling than subsystem matching, I'd just as soon > add %(subject:subsystem) or similar. :) ;-) I like that as a general direction.
On Fri, Sep 23, 2022 at 05:58:56PM -0400, Jeff King wrote: > I don't think you even really need the regexp. If we respect --date, > then you should be able to ask for --date=format:%Y-%m. Hmm I tried passing in --date=format:... to my patched shortlog command along with setting some date placeholder like "... %cd ..." in the code, but it's not picking up on the format. Do you know how the date format can be wedged into the format_commit_message(...) "format" argument? > Unfortunately there's no way to specify the format as part of the > placeholder. The for-each-ref formatter understands this, like: > > %(authordate:format:%Y-%m) > > I wouldn't be opposed to teaching the git-log formatter something > similar. Oh that would solve my problem... Would it be a hefty effort to teach this to the git-log formatter? > But there's something else interesting going on in Jack's patch, which > is that he's not just introducing the date-sorting, but also that it's > used in conjunction with other sorting. So really the intended use is > something like: > > git shortlog --group:author --group:%Y-%m Yes I sort of stumbled on this and realized that this way I wouldn't have to touch the actual sorting or grouping functionality at all, which was already working properly. I just needed to reformat the shortlog message to include the year and/or month in a way that kept things consistent. > I think we'd want to allow the general form to be a series of groupings. > In the output from his patch it looks like: > > 2022-09 Jeff King > some commit message > another commit message > > I.e., the groups are collapsed into a single string, and unique strings > become their own groups (and are sorted in the usual way). > > If you give up the regex thing, then that naturally falls out as > (imagining we learn about authordate as a placeholder): > > git shortlog --group='%(authordate:format=%Y-%n) %an' > > without having to implement multiple groupings as a specific feature > (which is both more code, but also has user-facing confusion about when > --group overrides versus appends). That also skips the question of which > --group-by-regex applies to which --group-by-value. > > I do agree the regex thing is more flexible, but if we can't come up > with a case more compelling than subsystem matching, I'd just as soon > add %(subject:subsystem) or similar. :) > > -Peff I like this idea too. Since it requires a larger re-implementation, maybe I can pursue this going forward. I assume if we did this we would keep the existing group options like "--group=author" as shortcuts, and refactor them behind the scenes to use the new method. If so it may be useful to add my originally suggested options of "--group=year" and "--group=month" as well for convenient default time-based groupings. How do you feel about me submitting a v3 patch of my initial suggested implementation of new group options for year and month? Then going forward I can work on generalizing the grouping feature the way Peff suggested. -Jack
On Fri, Sep 23, 2022 at 05:58:56PM -0400, Jeff King wrote: > On Fri, Sep 23, 2022 at 09:17:10AM -0700, Junio C Hamano wrote: > > > Not a suggestion to use a different implementation or add a new > > feature on top of this --group-by-time-range idea, but I wonder if > > it is a more flexible and generalizeable approach to say "formulate > > this value given by the --format=<format> string, apply this regular > > expression match, and group by the subexpression value". E.g. > > > > git shortlog \ > > --group-by-value="%cI" \ > > --group-by-regexp="^(\d{4}-\d{2})" > > Heh, I was about to make the exact same suggestion. The existing > "--group=author" could really just be "--group='%an <%ae>'" (or variants > depending on the "-e" flag). This caught my attention, so I wanted to see how hard it would be to implement. It actually is quite straightforward, and gets us most of the way to being able to get the same functionality as in Jacob's patch (minus being able to do the for-each-ref-style sub-selectors, like `%(authordate:format=%Y-%m)`). Here's the patch: --- >8 --- diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 7a1e1fe7c0..68880e8867 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -200,6 +200,29 @@ static void insert_records_from_trailers(struct shortlog *log, unuse_commit_buffer(commit, commit_buffer); } +static void insert_record_from_pretty(struct shortlog *log, + struct strset *dups, + struct commit *commit, + struct pretty_print_context *ctx, + const char *oneline) +{ + struct strbuf ident = STRBUF_INIT; + size_t i; + + for (i = 0; i < log->pretty.nr; i++) { + if (i) + strbuf_addch(&ident, ' '); + + format_commit_message(commit, log->pretty.items[i].string, + &ident, ctx); + } + + if (strset_add(dups, ident.buf)) + insert_one_record(log, ident.buf, oneline); + + strbuf_release(&ident); +} + void shortlog_add_commit(struct shortlog *log, struct commit *commit) { struct strbuf ident = STRBUF_INIT; @@ -243,6 +266,8 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) if (log->groups & SHORTLOG_GROUP_TRAILER) { insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str); } + if (log->groups & SHORTLOG_GROUP_PRETTY) + insert_record_from_pretty(log, &dups, commit, &ctx, oneline_str); strset_clear(&dups); strbuf_release(&ident); @@ -321,8 +346,10 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns else if (skip_prefix(arg, "trailer:", &field)) { log->groups |= SHORTLOG_GROUP_TRAILER; string_list_append(&log->trailers, field); - } else - return error(_("unknown group type: %s"), arg); + } else { + log->groups |= SHORTLOG_GROUP_PRETTY; + string_list_append(&log->pretty, arg); + } return 0; } @@ -340,6 +367,7 @@ void shortlog_init(struct shortlog *log) log->in2 = DEFAULT_INDENT2; log->trailers.strdup_strings = 1; log->trailers.cmp = strcasecmp; + log->pretty.strdup_strings = 1; } int cmd_shortlog(int argc, const char **argv, const char *prefix) diff --git a/shortlog.h b/shortlog.h index 3f7e9aabca..d7caecb76f 100644 --- a/shortlog.h +++ b/shortlog.h @@ -20,8 +20,10 @@ struct shortlog { SHORTLOG_GROUP_AUTHOR = (1 << 0), SHORTLOG_GROUP_COMMITTER = (1 << 1), SHORTLOG_GROUP_TRAILER = (1 << 2), + SHORTLOG_GROUP_PRETTY = (1 << 3), } groups; struct string_list trailers; + struct string_list pretty; int email; struct string_list mailmap; --- 8< --- > I don't think you even really need the regexp. If we respect --date, > then you should be able to ask for --date=format:%Y-%m. Unfortunately > there's no way to specify the format as part of the placeholder. The > for-each-ref formatter understands this, like: > > %(authordate:format:%Y-%m) > > I wouldn't be opposed to teaching the git-log formatter something > similar. Yeah, I think having a similar mechanism there would be useful, and certainly a prerequisite to being able to achieve what Jacob has done here with the more general approach. I think you could also do some cleanup on top, like replacing the SHORTLOG_GROUP_AUTHOR mode with adding either "%aN <%aE>" (or "%aN", without --email) as an entry in the `pretty` string_list. Thanks, Taylor
On Fri, Sep 23, 2022 at 09:38:29PM -0700, Jacob Stopak wrote: > Hmm I tried passing in --date=format:... to my patched shortlog command > along with setting some date placeholder like "... %cd ..." in the code, > but it's not picking up on the format. Do you know how the date format > can be wedged into the format_commit_message(...) "format" argument? It comes to the format code via the pretty_print_context. And we pick up the --date command via setup_revisions(), where it ends up in rev_info.date_mode. In a normal git-log, I think that data gets shuffled across by show_log(). But shortlog has its own traversal. I think something like this: diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 7a1e1fe7c0..53c379a51d 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -211,7 +211,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) ctx.fmt = CMIT_FMT_USERFORMAT; ctx.abbrev = log->abbrev; ctx.print_email_subject = 1; - ctx.date_mode.type = DATE_NORMAL; + ctx.date_mode = log->date_mode; ctx.output_encoding = get_log_output_encoding(); if (!log->summary) { @@ -407,6 +407,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix) log.user_format = rev.commit_format == CMIT_FMT_USERFORMAT; log.abbrev = rev.abbrev; log.file = rev.diffopt.file; + log.date_mode = rev.date_mode; if (!log.groups) log.groups = SHORTLOG_GROUP_AUTHOR; diff --git a/shortlog.h b/shortlog.h index 3f7e9aabca..ef3a3dbc65 100644 --- a/shortlog.h +++ b/shortlog.h @@ -15,6 +15,7 @@ struct shortlog { int in2; int user_format; int abbrev; + struct date_mode date_mode; enum { SHORTLOG_GROUP_AUTHOR = (1 << 0), is enough. At least it allows: git shortlog --format='%ad %s' --date=format:%Y-%m to work as you'd expect (but of course that's just the output for each commit that we show, not the actual grouping). > > Unfortunately there's no way to specify the format as part of the > > placeholder. The for-each-ref formatter understands this, like: > > > > %(authordate:format:%Y-%m) > > > > I wouldn't be opposed to teaching the git-log formatter something > > similar. > > Oh that would solve my problem... Would it be a hefty effort to teach > this to the git-log formatter? Probably not a huge amount of work. But it puts us in a weird in-between situation where we support _one_ of the more advanced ref-filter placeholders, but not the others. And of course no code is shared. That might be OK, as long as the syntax and semantics are identical to what ref-filter can do. Then in the long run, if we eventually merge the two implementations, there's no compatibility problem. That said, I think it may just be easier to respect --date, as above. It's not quite as flexible, but it's probably flexible enough. -Peff
On Wed, Oct 05, 2022 at 05:43:20PM -0400, Taylor Blau wrote: > > Heh, I was about to make the exact same suggestion. The existing > > "--group=author" could really just be "--group='%an <%ae>'" (or variants > > depending on the "-e" flag). > > This caught my attention, so I wanted to see how hard it would be to > implement. It actually is quite straightforward, and gets us most of the > way to being able to get the same functionality as in Jacob's patch > (minus being able to do the for-each-ref-style sub-selectors, like > `%(authordate:format=%Y-%m)`). Yeah, your patch is about what I'd expect. The date thing I think can be done with --date; I just sent a sketch in another part of the thread. > +static void insert_record_from_pretty(struct shortlog *log, > + struct strset *dups, > + struct commit *commit, > + struct pretty_print_context *ctx, > + const char *oneline) > +{ > + struct strbuf ident = STRBUF_INIT; > + size_t i; > + > + for (i = 0; i < log->pretty.nr; i++) { > + if (i) > + strbuf_addch(&ident, ' '); > + > + format_commit_message(commit, log->pretty.items[i].string, > + &ident, ctx); > + } So here you're allowing multiple pretty options. But really, once we allow the user an arbitrary format, is there any reason for them to do: git shortlog --group=%an --group=%ad versus just: git shortlog --group='%an %ad' ? > void shortlog_add_commit(struct shortlog *log, struct commit *commit) > { > struct strbuf ident = STRBUF_INIT; > @@ -243,6 +266,8 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) > if (log->groups & SHORTLOG_GROUP_TRAILER) { > insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str); > } > + if (log->groups & SHORTLOG_GROUP_PRETTY) > + insert_record_from_pretty(log, &dups, commit, &ctx, oneline_str); I was puzzled at first that this was a bitwise check. But I forgot that we added support for --group options already, in 63d24fa0b0 (shortlog: allow multiple groups to be specified, 2020-09-27). So a plan like: git shortlog --group=author --group=date (as in the original patch in this thread) doesn't quite work, I think. Because the semantics for multiple --group lines are that the commit is credited individually to each ident. That's what lets you do: git shortlog -ns --group=author --group=trailer:co-authored-by and credit authors and co-authors equally. So likewise, I think multiple group-format options don't really make sense (or at least, do not make sense to concatenate; you'd put each key in its own single format). > @@ -321,8 +346,10 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns > else if (skip_prefix(arg, "trailer:", &field)) { > log->groups |= SHORTLOG_GROUP_TRAILER; > string_list_append(&log->trailers, field); > - } else > - return error(_("unknown group type: %s"), arg); > + } else { > + log->groups |= SHORTLOG_GROUP_PRETTY; > + string_list_append(&log->pretty, arg); > + } We probably want to insist that the format contains a "%" sign, and/or git it a keyword like "format:". Otherwise a typo like: git shortlog --format=autor stops being an error we detect, and just returns nonsense results (every commit has the same ident). I think you'd want to detect SHORTLOG_GROUP_PRETTY in the read_from_stdin() path, too. And probably just die() with "not supported", like we do for trailers. > I think you could also do some cleanup on top, like replacing the > SHORTLOG_GROUP_AUTHOR mode with adding either "%aN <%aE>" (or "%aN", > without --email) as an entry in the `pretty` string_list. Yeah, that would be a nice cleanup. I think might even be a good idea to explain the various options to the users in terms of "--author is equivalent to %aN <%aE>". It may help them understand how the tool works. -Peff
On Wed, Oct 05, 2022 at 06:26:57PM -0400, Jeff King wrote: > On Wed, Oct 05, 2022 at 05:43:20PM -0400, Taylor Blau wrote: > > > This caught my attention, so I wanted to see how hard it would be to > > implement. It actually is quite straightforward, and gets us most of the > > way to being able to get the same functionality as in Jacob's patch > > (minus being able to do the for-each-ref-style sub-selectors, like > > `%(authordate:format=%Y-%m)`). Thanks Taylor!! This looks awesome and helped me understand how the pretty context stuff works. I was able to apply your patch locally and test, and plan to continue working off of this :D. Like Peff mentioned seems to be a few usage details to hammer out. > The date thing I think can be done with --date; I just sent a sketch in > another part of the thread. Peff - I applied your --date sketch onto Taylor's patch and it worked first try. > So here you're allowing multiple pretty options. But really, once we > allow the user an arbitrary format, is there any reason for them to do: > > git shortlog --group=%an --group=%ad > > versus just: > > git shortlog --group='%an %ad' > > ? Yes I can't think of an advantage of having multiple custom-formatted group fields. Also see my note on this below related to your comment on specifying multiple groups. > > void shortlog_add_commit(struct shortlog *log, struct commit *commit) > > { > > struct strbuf ident = STRBUF_INIT; > > @@ -243,6 +266,8 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) > > if (log->groups & SHORTLOG_GROUP_TRAILER) { > > insert_records_from_trailers(log, &dups, commit, &ctx, oneline_str); > > } > > + if (log->groups & SHORTLOG_GROUP_PRETTY) > > + insert_record_from_pretty(log, &dups, commit, &ctx, oneline_str); > > I was puzzled at first that this was a bitwise check. But I forgot that > we added support for --group options already, in 63d24fa0b0 (shortlog: > allow multiple groups to be specified, 2020-09-27). > > So a plan like: > > git shortlog --group=author --group=date > > (as in the original patch in this thread) doesn't quite work, I think. My first patch addressed this by specifically handling cases where the new grouping options were passed in-tandem with existing options, and making sure only a single shortlog group was generated. But if we're generalizing the custom group format then it might be unecessary to even allow the custom group in tandem with most other options (like 'author' and 'committer'), since those options can be included in the custom group format. The trailer option might be an exception, but that could possibly just be handled as a special case. > We probably want to insist that the format contains a "%" sign, and/or > git it a keyword like "format:". Otherwise a typo like: > > git shortlog --format=autor > > stops being an error we detect, and just returns nonsense results > (every commit has the same ident). Small aside: I like how Taylor re-used the --group option for the custom format. IMO it hammers home that this is a grouping option and not just formatting or filtering which can be confusing to users sometimes when doing data analytics. But your points here all still apply. Maybe detecting a "%" sign can be the way that git identifies the custom format being passed in. In conjuction with the % identifiers, this would still enable users to add in some arbitrary constant label like --group='Year: %cd' --date='%Y', without affecting the grouped/sorted results since all entries would then include the "Years: " prefix (or postfix or however they decide to write it). The one case that should probably be handled is when no % sign is used and no other matching flags like --author or --committer are either, because currently that will just group all commits under 1 nonsensical group name like "autor" as you mentioned. > I think you'd want to detect SHORTLOG_GROUP_PRETTY in the > read_from_stdin() path, too. And probably just die() with "not > supported", like we do for trailers. Glad you said this because I applied this in my original patch with the explicit --year and --month groups. Didn't seem to be an obvious use-case with the stdin even though my original year and month values could possibly be read from the git log output supplied as stdin. But for a generalized group format seems even more far-fetched to try and make it jive with the stdin version of the command. -Jack
On Thu, Oct 06, 2022 at 05:48:06PM -0700, Jacob Stopak wrote: > > The date thing I think can be done with --date; I just sent a sketch in > > another part of the thread. > > Peff - I applied your --date sketch onto Taylor's patch and it worked > first try. Yeah; I think that allowing the `--date` argument (similar to how `git log` treats an option by the same name) is sensible, albeit slightly less flexible than the proposed `%(authordate:format=%Y-%m)`. For what it's worth, I don't have anything against the latter (other than that it is slightly redundant in many cases with `--date`), but it does seem easier and more worthwhile to repurpose `--date` here instead. Thanks, Taylor
On Wed, Oct 05, 2022 at 06:26:57PM -0400, Jeff King wrote: > > +static void insert_record_from_pretty(struct shortlog *log, > > + struct strset *dups, > > + struct commit *commit, > > + struct pretty_print_context *ctx, > > + const char *oneline) > > +{ > > + struct strbuf ident = STRBUF_INIT; > > + size_t i; > > + > > + for (i = 0; i < log->pretty.nr; i++) { > > + if (i) > > + strbuf_addch(&ident, ' '); > > + > > + format_commit_message(commit, log->pretty.items[i].string, > > + &ident, ctx); > > + } > > So here you're allowing multiple pretty options. But really, once we > allow the user an arbitrary format, is there any reason for them to do: > > git shortlog --group=%an --group=%ad > > versus just: > > git shortlog --group='%an %ad' > > ? I think that if you want to unify `--group=author` into the new format group implementation, you would have to allow multiple `--group` options, but each such option would generate its own shortlog identity instead of getting concatenated together. Thanks, Taylor
On Thu, Oct 06, 2022 at 05:48:06PM -0700, Jacob Stopak wrote: > > We probably want to insist that the format contains a "%" sign, and/or > > git it a keyword like "format:". Otherwise a typo like: > > > > git shortlog --format=autor > > > > stops being an error we detect, and just returns nonsense results > > (every commit has the same ident). > > Small aside: I like how Taylor re-used the --group option for the custom format. > IMO it hammers home that this is a grouping option and not just formatting or > filtering which can be confusing to users sometimes when doing data analytics. Yeah, sorry this was just a typo/thinko on my part. It absolutely should be --group, as --format already does something else. -Peff
On Fri, Oct 07, 2022 at 06:24:36PM -0400, Taylor Blau wrote: > > So here you're allowing multiple pretty options. But really, once we > > allow the user an arbitrary format, is there any reason for them to do: > > > > git shortlog --group=%an --group=%ad > > > > versus just: > > > > git shortlog --group='%an %ad' > > > > ? > > I think that if you want to unify `--group=author` into the new format > group implementation, you would have to allow multiple `--group` > options, but each such option would generate its own shortlog identity > instead of getting concatenated together. Exactly, and I think we have to do that anyway to match the existing multiple-option behavior. -Peff
diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt index f64e77047b..ab68b287d8 100644 --- a/Documentation/git-shortlog.txt +++ b/Documentation/git-shortlog.txt @@ -54,6 +54,8 @@ OPTIONS -- - `author`, commits are grouped by author - `committer`, commits are grouped by committer (the same as `-c`) + - `month`, commits are grouped by month (the same as `-m`) + - `year`, commits are grouped by year (the same as `-y`) - `trailer:<field>`, the `<field>` is interpreted as a case-insensitive commit message trailer (see linkgit:git-interpret-trailers[1]). For example, if your project uses `Reviewed-by` trailers, you might want @@ -80,6 +82,14 @@ counts both authors and co-authors. --committer:: This is an alias for `--group=committer`. +-m:: +--month:: + This is an alias for `--group=month`. + +-y:: +--year:: + This is an alias for `--group=year`. + -w[<width>[,<indent1>[,<indent2>]]]:: Linewrap the output by wrapping each line at `width`. The first line of each entry is indented by `indent1` spaces, and the second diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 7a1e1fe7c0..1beba9b91c 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -133,6 +133,10 @@ static void read_from_stdin(struct shortlog *log) break; case SHORTLOG_GROUP_TRAILER: die(_("using --group=trailer with stdin is not supported")); + case SHORTLOG_GROUP_YEAR: + die(_("using --group=year with stdin is not supported")); + case SHORTLOG_GROUP_MONTH: + die(_("using --group=month with stdin is not supported")); default: BUG("unhandled shortlog group"); } @@ -200,10 +204,28 @@ static void insert_records_from_trailers(struct shortlog *log, unuse_commit_buffer(commit, commit_buffer); } +static void format_commit_date(struct commit *commit, struct strbuf *sb, + char *format, struct shortlog *log) +{ + time_t t = (time_t) commit->date; + struct tm commit_date; + localtime_r(&t, &commit_date); + + if (log->groups & SHORTLOG_GROUP_MONTH) { + strftime(sb->buf, strbuf_avail(sb), "%Y/%m", &commit_date); + snprintf(sb->buf+7, strbuf_avail(sb), "%s", format); + } else if (log->groups & SHORTLOG_GROUP_YEAR) { + strftime(sb->buf, strbuf_avail(sb), "%Y", &commit_date); + snprintf(sb->buf+4, strbuf_avail(sb), "%s", format); + } +} + void shortlog_add_commit(struct shortlog *log, struct commit *commit) { struct strbuf ident = STRBUF_INIT; struct strbuf oneline = STRBUF_INIT; + struct strbuf buffer; + struct strset dups = STRSET_INIT; struct pretty_print_context ctx = {0}; const char *oneline_str; @@ -214,6 +236,8 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) ctx.date_mode.type = DATE_NORMAL; ctx.output_encoding = get_log_output_encoding(); + strbuf_init(&buffer, 100); + if (!log->summary) { if (log->user_format) pretty_print_commit(&ctx, commit, &oneline); @@ -222,20 +246,47 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) } oneline_str = oneline.len ? oneline.buf : "<none>"; - if (log->groups & SHORTLOG_GROUP_AUTHOR) { - strbuf_reset(&ident); + if ((log->groups & SHORTLOG_GROUP_MONTH) && (log->groups & SHORTLOG_GROUP_YEAR)) + log->groups ^= SHORTLOG_GROUP_YEAR; + + if (((log->groups & SHORTLOG_GROUP_MONTH) || (log->groups & SHORTLOG_GROUP_YEAR)) + && !HAS_MULTI_BITS(log->groups)) { + format_commit_date(commit, &buffer, "", log); format_commit_message(commit, - log->email ? "%aN <%aE>" : "%aN", + buffer.buf, &ident, &ctx); + + if (strset_add(&dups, ident.buf)) + insert_one_record(log, ident.buf, oneline_str); + } + if (log->groups & SHORTLOG_GROUP_AUTHOR) { + strbuf_reset(&ident); + if ((log->groups & SHORTLOG_GROUP_MONTH) || (log->groups & SHORTLOG_GROUP_YEAR)) { + format_commit_date(commit, &buffer, log->email ? " %aN <%aE>" : " %aN", log); + format_commit_message(commit, + buffer.buf, + &ident, &ctx); + } else { + format_commit_message(commit, + log->email ? "%aN <%aE>" : "%aN", + &ident, &ctx); + } if (!HAS_MULTI_BITS(log->groups) || strset_add(&dups, ident.buf)) insert_one_record(log, ident.buf, oneline_str); } if (log->groups & SHORTLOG_GROUP_COMMITTER) { strbuf_reset(&ident); - format_commit_message(commit, - log->email ? "%cN <%cE>" : "%cN", - &ident, &ctx); + if ((log->groups & SHORTLOG_GROUP_MONTH) || (log->groups & SHORTLOG_GROUP_YEAR)) { + format_commit_date(commit, &buffer, log->email ? " %cN <%cE>" : " %cN", log); + format_commit_message(commit, + buffer.buf, + &ident, &ctx); + } else { + format_commit_message(commit, + log->email ? "%cN <%cE>" : "%cN", + &ident, &ctx); + } if (!HAS_MULTI_BITS(log->groups) || strset_add(&dups, ident.buf)) insert_one_record(log, ident.buf, oneline_str); @@ -247,6 +298,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) strset_clear(&dups); strbuf_release(&ident); strbuf_release(&oneline); + strbuf_release(&buffer); } static void get_from_rev(struct rev_info *rev, struct shortlog *log) @@ -314,15 +366,20 @@ static int parse_group_option(const struct option *opt, const char *arg, int uns if (unset) { log->groups = 0; string_list_clear(&log->trailers, 0); - } else if (!strcasecmp(arg, "author")) + } else if (!strcasecmp(arg, "author")) { log->groups |= SHORTLOG_GROUP_AUTHOR; - else if (!strcasecmp(arg, "committer")) + } else if (!strcasecmp(arg, "committer")) { log->groups |= SHORTLOG_GROUP_COMMITTER; - else if (skip_prefix(arg, "trailer:", &field)) { + } else if (skip_prefix(arg, "trailer:", &field)) { log->groups |= SHORTLOG_GROUP_TRAILER; string_list_append(&log->trailers, field); - } else + } else if (!strcasecmp(arg, "month")) { + log->groups |= SHORTLOG_GROUP_MONTH; + } else if (!strcasecmp(arg, "year")) { + log->groups |= SHORTLOG_GROUP_YEAR; + } else { return error(_("unknown group type: %s"), arg); + } return 0; } @@ -363,6 +420,12 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix) &parse_wrap_args), OPT_CALLBACK(0, "group", &log, N_("field"), N_("group by field"), parse_group_option), + OPT_BIT('m', "month", &log.groups, + N_("group by month rather than author"), + SHORTLOG_GROUP_MONTH), + OPT_BIT('y', "year", &log.groups, + N_("group by year rather than author"), + SHORTLOG_GROUP_YEAR), OPT_END(), }; diff --git a/shortlog.h b/shortlog.h index 3f7e9aabca..45b5efb6dc 100644 --- a/shortlog.h +++ b/shortlog.h @@ -20,6 +20,8 @@ struct shortlog { SHORTLOG_GROUP_AUTHOR = (1 << 0), SHORTLOG_GROUP_COMMITTER = (1 << 1), SHORTLOG_GROUP_TRAILER = (1 << 2), + SHORTLOG_GROUP_MONTH = (1 << 3), + SHORTLOG_GROUP_YEAR = (1 << 4), } groups; struct string_list trailers; diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh index 3095b1b2ff..981f45f732 100755 --- a/t/t4201-shortlog.sh +++ b/t/t4201-shortlog.sh @@ -359,4 +359,46 @@ test_expect_success 'stdin with multiple groups reports error' ' test_must_fail git shortlog --group=author --group=committer <log ' +test_expect_success '--group=year groups output by year' ' + git commit --allow-empty -m "git shortlog --group=year test" && + cat >expect <<-\EOF && + 1 2005 + EOF + git shortlog -ns \ + --group=year \ + -1 HEAD >actual && + test_cmp expect actual +' + +test_expect_success 'stdin with --group=year reports error' ' + test_must_fail git shortlog --group=year +' + +test_expect_success '--group=month groups output by month' ' + git commit --allow-empty -m "git shortlog --group=month test" && + cat >expect <<-\EOF && + 1 2005/04 + EOF + git shortlog -ns \ + --group=month \ + -1 HEAD >actual && + test_cmp expect actual +' + +test_expect_success 'stdin with --group=month reports error' ' + test_must_fail git shortlog --group=month +' + +test_expect_success '--group=month and --group=year defaults to month' ' + git commit --allow-empty -m "git shortlog --group=month --group=year test" && + cat >expect <<-\EOF && + 1 2005/04 + EOF + git shortlog -ns \ + --group=month \ + --group=year \ + -1 HEAD >actual && + test_cmp expect actual +' + test_done