Message ID | 55a6ef7bc0082818fa51a0915c43002ede5c449f.1665448437.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | shortlog: introduce `--group=<format>` | expand |
On Mon, Oct 10, 2022 at 08:34:15PM -0400, Taylor Blau wrote: > Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as > a special case of the new `--group=<format>` mode, where the author mode > is a shorthand for `--group='%aN <%aE>'. OK, this should be a nice cleanup. > diff --git a/builtin/log.c b/builtin/log.c > index ee19dc5d45..6b77e520b5 100644 > --- a/builtin/log.c > +++ b/builtin/log.c > @@ -1334,6 +1334,7 @@ static void make_cover_letter(struct rev_info *rev, int use_separate_file, > log.in2 = 4; > log.file = rev->diffopt.file; > log.groups = SHORTLOG_GROUP_AUTHOR; > + shortlog_init_group(&log); > for (i = 0; i < nr; i++) > shortlog_add_commit(&log, list[i]); In another caller you drop the assignment of log.groups, since shortlog_init_group() already does so if log.groups is 0 (which it will be, since shortlog_init() will zero-initialize). Should we do the same here? Or maybe leaving it is more obvious. It would be more obvious still if we made the helper take the type, like: shortlog_init_group(&log, SHORTLOG_GROUP_AUTHOR); But I guess that is not accurate, as we'd eventually use this in shortlog.c to turn _any_ bits we've accumulated in log.group into their correct formats. I think the name of the helper function puzzled me a bit. It is really "finish up any setup for the shortlog struct". We could lazily do it in shortlog_add_commit(), but that's pretty subtle. But could we maybe call it: shortlog_finish_setup(); or something? I dunno. I might be nit-picking, but I actually had to scratch my head quite a bit to understand what was going on here. > diff --git a/builtin/shortlog.c b/builtin/shortlog.c > index f708d96558..aac8c7afa4 100644 > --- a/builtin/shortlog.c > +++ b/builtin/shortlog.c > @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) > } > oneline_str = oneline.len ? oneline.buf : "<none>"; > > - if (log->groups & SHORTLOG_GROUP_AUTHOR) { > - strbuf_reset(&ident); > - format_commit_message(commit, > - log->email ? "%aN <%aE>" : "%aN", > - &ident, &ctx); > - if (!HAS_MULTI_BITS(log->groups) || > - strset_add(&dups, ident.buf)) > - insert_one_record(log, ident.buf, oneline_str); > - } This loses the HAS_MULTI_BITS() optimization. The idea there is that if you have a single group-by that can't produce multiple outputs, then there's no need to do duplicate detection. The equivalent in an all-formats world is something like: log.format.nr > 1 && !log.trailers.nr (because trailers are special in that one trailer key can produce multiple idents for a single commit). > +void shortlog_init_group(struct shortlog *log) > +{ > + if (!log->groups) > + log->groups = SHORTLOG_GROUP_AUTHOR; > + > + if (log->groups & SHORTLOG_GROUP_AUTHOR) > + string_list_append(&log->format, > + log->email ? "%aN <%aE>" : "%aN"); > +} Regardless of the naming suggestion I made, I think things would be more obvious if this top conditional remained in cmd_shortlog(). And then the explicit assignment of "log.group" in make_cover_letter() would remain. > @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix) > log.file = rev.diffopt.file; > log.date_mode = rev.date_mode; > > - if (!log.groups) > - log.groups = SHORTLOG_GROUP_AUTHOR; > + shortlog_init_group(&log); > + > string_list_sort(&log.trailers); Now that we have a "finish the shortlog init" function, probably this trailer sort should go into it, too. The current caller doesn't need it, but it removes on more gotcha from using the shortlog API. -Peff
On Mon, Oct 10, 2022 at 09:34:42PM -0400, Jeff King wrote: > On Mon, Oct 10, 2022 at 08:34:15PM -0400, Taylor Blau wrote: > > I think the name of the helper function puzzled me a bit. It is really > "finish up any setup for the shortlog struct". We could lazily do it in > shortlog_add_commit(), but that's pretty subtle. But could we maybe call > it: > > shortlog_finish_setup(); > > or something? I dunno. I might be nit-picking, but I actually had to > scratch my head quite a bit to understand what was going on here. I think that shortlog_finish_setup() is probably the most reasonable choice (and I changed it to that locally). I spent a day or so thinking on and off about this while writing the series, and didn't come up with any satisfying names. I think it points to something deeper about the API that doesn't quite sit right with me. But shortlog_finish_setup() is the least-unsatisfying name so far, so let's go with that ;-). > > diff --git a/builtin/shortlog.c b/builtin/shortlog.c > > index f708d96558..aac8c7afa4 100644 > > --- a/builtin/shortlog.c > > +++ b/builtin/shortlog.c > > @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) > > } > > oneline_str = oneline.len ? oneline.buf : "<none>"; > > > > - if (log->groups & SHORTLOG_GROUP_AUTHOR) { > > - strbuf_reset(&ident); > > - format_commit_message(commit, > > - log->email ? "%aN <%aE>" : "%aN", > > - &ident, &ctx); > > - if (!HAS_MULTI_BITS(log->groups) || > > - strset_add(&dups, ident.buf)) > > - insert_one_record(log, ident.buf, oneline_str); > > - } > > This loses the HAS_MULTI_BITS() optimization. The idea there is that if > you have a single group-by that can't produce multiple outputs, then > there's no need to do duplicate detection. > > The equivalent in an all-formats world is something like: > > log.format.nr > 1 && !log.trailers.nr > > (because trailers are special in that one trailer key can produce > multiple idents for a single commit). Hmm. I suspect that this is going to become more complicated by the time that you read the final patch ;-). Let's wait until then and see what you think. > > +void shortlog_init_group(struct shortlog *log) > > +{ > > + if (!log->groups) > > + log->groups = SHORTLOG_GROUP_AUTHOR; > > + > > + if (log->groups & SHORTLOG_GROUP_AUTHOR) > > + string_list_append(&log->format, > > + log->email ? "%aN <%aE>" : "%aN"); > > +} > > Regardless of the naming suggestion I made, I think things would be more > obvious if this top conditional remained in cmd_shortlog(). And then the > explicit assignment of "log.group" in make_cover_letter() would remain. Yep, works for me. > > @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix) > > log.file = rev.diffopt.file; > > log.date_mode = rev.date_mode; > > > > - if (!log.groups) > > - log.groups = SHORTLOG_GROUP_AUTHOR; > > + shortlog_init_group(&log); > > + > > string_list_sort(&log.trailers); > > Now that we have a "finish the shortlog init" function, probably this > trailer sort should go into it, too. The current caller doesn't need it, > but it removes on more gotcha from using the shortlog API. We'll drop this list by the end of the series, too, so it probably isn't worth moving it into shortlog_finish_setup() in the interim. > -Peff Thanks, Taylor
On Mon, Oct 10, 2022 at 09:48:47PM -0400, Taylor Blau wrote: > I think that shortlog_finish_setup() is probably the most reasonable > choice (and I changed it to that locally). I spent a day or so thinking > on and off about this while writing the series, and didn't come up with > any satisfying names. > > I think it points to something deeper about the API that doesn't quite > sit right with me. But shortlog_finish_setup() is the least-unsatisfying > name so far, so let's go with that ;-). Yeah, touching that block in make_cover_letter() definitely tickled my memory that there was some awkwardness there. That's when I added shortlog_init() at all (which is good, because otherwise you'd have had an uninitialized log.format string-list). I think the general pattern of: foo_init(); foo.option = whatever; foo_finish_setup(); foo_do_the_thing(); is OK. It's a little complicated, but it gives callers the freedom to tweak options with a lot of flexibility (e.g., based on command line config, command line options, etc). We have a similar pattern with diff_setup_done(). The other option is for any option-tweaking to go through methods which maintain invariants (like if GROUP_AUTHOR is set, then the "%an" format has been added). That's usually the "right" object-oriented way to do it. But I think even in this simple case it gets awkward, because the correct format can't be known until you see whether log->email is set. So it really has to be a "finalize" step after both log->email and log->group are set. > > This loses the HAS_MULTI_BITS() optimization. The idea there is that if > > you have a single group-by that can't produce multiple outputs, then > > there's no need to do duplicate detection. > > > > The equivalent in an all-formats world is something like: > > > > log.format.nr > 1 && !log.trailers.nr > > > > (because trailers are special in that one trailer key can produce > > multiple idents for a single commit). > > Hmm. I suspect that this is going to become more complicated by the time > that you read the final patch ;-). Let's wait until then and see what > you think. Yes, it's pretty gross with the trailer util thing. You'd probably want to do something like: static int needs_dedup(const struct string_list *formats) { struct string_list_item *item; if (formats->nr > 1) return 1; for_each_string_list_item(item, formats) { if (item->util) return 1; } return 0; } and perhaps call it only once and cache the result, rather than iterating for each commit/format. But if we leave trailers as is, then the logic is much easier. And implementing the optimization for --group=format should fall out pretty naturally (and that both preserves it for --group=author, and extends it to --group=format, which I should have noticed when reviewing patch 4). > > > @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix) > > > log.file = rev.diffopt.file; > > > log.date_mode = rev.date_mode; > > > > > > - if (!log.groups) > > > - log.groups = SHORTLOG_GROUP_AUTHOR; > > > + shortlog_init_group(&log); > > > + > > > string_list_sort(&log.trailers); > > > > Now that we have a "finish the shortlog init" function, probably this > > trailer sort should go into it, too. The current caller doesn't need it, > > but it removes on more gotcha from using the shortlog API. > > We'll drop this list by the end of the series, too, so it probably isn't > worth moving it into shortlog_finish_setup() in the interim. Ah, right. Well, see my other comments. :) -Peff
On Mon, Oct 10 2022, Taylor Blau wrote: > Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as > a special case of the new `--group=<format>` mode, where the author mode > is a shorthand for `--group='%aN <%aE>'. > > Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it > has a different meaning in `read_from_stdin()`, where it is still used > for a different purpose. > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > builtin/log.c | 1 + > builtin/shortlog.c | 23 ++++++++++++----------- > shortlog.h | 1 + > 3 files changed, 14 insertions(+), 11 deletions(-) > > diff --git a/builtin/log.c b/builtin/log.c > index ee19dc5d45..6b77e520b5 100644 > --- a/builtin/log.c > +++ b/builtin/log.c > @@ -1334,6 +1334,7 @@ static void make_cover_letter(struct rev_info *rev, int use_separate_file, > log.in2 = 4; > log.file = rev->diffopt.file; > log.groups = SHORTLOG_GROUP_AUTHOR; > + shortlog_init_group(&log); > for (i = 0; i < nr; i++) > shortlog_add_commit(&log, list[i]); > > diff --git a/builtin/shortlog.c b/builtin/shortlog.c > index f708d96558..aac8c7afa4 100644 > --- a/builtin/shortlog.c > +++ b/builtin/shortlog.c > @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) > } > oneline_str = oneline.len ? oneline.buf : "<none>"; > > - if (log->groups & SHORTLOG_GROUP_AUTHOR) { > - strbuf_reset(&ident); > - format_commit_message(commit, > - log->email ? "%aN <%aE>" : "%aN", > - &ident, &ctx); > - if (!HAS_MULTI_BITS(log->groups) || > - strset_add(&dups, ident.buf)) > - insert_one_record(log, ident.buf, oneline_str); > - } > if (log->groups & SHORTLOG_GROUP_COMMITTER) { > strbuf_reset(&ident); > format_commit_message(commit, > @@ -372,6 +363,16 @@ void shortlog_init(struct shortlog *log) > log->format.strdup_strings = 1; > } > > +void shortlog_init_group(struct shortlog *log) > +{ > + if (!log->groups) > + log->groups = SHORTLOG_GROUP_AUTHOR; > + > + if (log->groups & SHORTLOG_GROUP_AUTHOR) Nit (easier reading): if (!x) ... else if (x & FLAG) ...
On Tue, Oct 11 2022, Ævar Arnfjörð Bjarmason wrote: > On Mon, Oct 10 2022, Taylor Blau wrote: >> +void shortlog_init_group(struct shortlog *log) >> +{ >> + if (!log->groups) >> + log->groups = SHORTLOG_GROUP_AUTHOR; >> + >> + if (log->groups & SHORTLOG_GROUP_AUTHOR) > > Nit (easier reading): > > if (!x) > ... > else if (x & FLAG) > ... Urgh, sorry about the noise. That's an obvious logic error, I don't know what I was thinking...
On Mon, Oct 10, 2022 at 09:48:47PM -0400, Taylor Blau wrote: > > > diff --git a/builtin/shortlog.c b/builtin/shortlog.c > > > index f708d96558..aac8c7afa4 100644 > > > --- a/builtin/shortlog.c > > > +++ b/builtin/shortlog.c > > > @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) > > > } > > > oneline_str = oneline.len ? oneline.buf : "<none>"; > > > > > > - if (log->groups & SHORTLOG_GROUP_AUTHOR) { > > > - strbuf_reset(&ident); > > > - format_commit_message(commit, > > > - log->email ? "%aN <%aE>" : "%aN", > > > - &ident, &ctx); > > > - if (!HAS_MULTI_BITS(log->groups) || > > > - strset_add(&dups, ident.buf)) > > > - insert_one_record(log, ident.buf, oneline_str); > > > - } > > > > This loses the HAS_MULTI_BITS() optimization. The idea there is that if > > you have a single group-by that can't produce multiple outputs, then > > there's no need to do duplicate detection. > > > > The equivalent in an all-formats world is something like: > > > > log.format.nr > 1 && !log.trailers.nr > > > > (because trailers are special in that one trailer key can produce > > multiple idents for a single commit). Hmm. Shouldn't that "&& !log.trailers.nr" be an "|| log.trailers.nr"? It doesn't seem to make sense to say "there are things that could produce multiple outputs" if there's more than one format _and_ no trailers. The logic should read "there are things that could produce multiple outputs if there is more than one format *or* at least one trailer". So I think the right change would be: --- >8 --- diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 95ceab7649..7e1b56e2aa 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -216,7 +216,8 @@ static void insert_records_from_format(struct shortlog *log, format_commit_message(commit, item->string, &buf, ctx); - if (strset_add(dups, buf.buf)) + if (!(log->format.nr > 1 || log->trailers.nr) || + strset_add(dups, buf.buf)) insert_one_record(log, buf.buf, oneline); } --- 8< --- Yeah? Thanks, Taylor
On Mon, Oct 10, 2022 at 10:15:32PM -0400, Jeff King wrote: > > > > @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix) > > > > log.file = rev.diffopt.file; > > > > log.date_mode = rev.date_mode; > > > > > > > > - if (!log.groups) > > > > - log.groups = SHORTLOG_GROUP_AUTHOR; > > > > + shortlog_init_group(&log); > > > > + > > > > string_list_sort(&log.trailers); > > > > > > Now that we have a "finish the shortlog init" function, probably this > > > trailer sort should go into it, too. The current caller doesn't need it, > > > but it removes on more gotcha from using the shortlog API. > > > > We'll drop this list by the end of the series, too, so it probably isn't > > worth moving it into shortlog_finish_setup() in the interim. > > Ah, right. Well, see my other comments. :) Seen ;-). Now that this series no longer touches the log.trailers bits, we should move this around. Done locally, will send a reroll shortly or tomorrow. Thanks, Taylor
On Thu, Oct 20, 2022 at 10:02:45PM -0400, Taylor Blau wrote: > > > > - if (log->groups & SHORTLOG_GROUP_AUTHOR) { > > > > - strbuf_reset(&ident); > > > > - format_commit_message(commit, > > > > - log->email ? "%aN <%aE>" : "%aN", > > > > - &ident, &ctx); > > > > - if (!HAS_MULTI_BITS(log->groups) || > > > > - strset_add(&dups, ident.buf)) > > > > - insert_one_record(log, ident.buf, oneline_str); > > > > - } > > > > > > This loses the HAS_MULTI_BITS() optimization. The idea there is that if > > > you have a single group-by that can't produce multiple outputs, then > > > there's no need to do duplicate detection. > > > > > > The equivalent in an all-formats world is something like: > > > > > > log.format.nr > 1 && !log.trailers.nr > > > > > > (because trailers are special in that one trailer key can produce > > > multiple idents for a single commit). > > Hmm. Shouldn't that "&& !log.trailers.nr" be an "|| log.trailers.nr"? It > doesn't seem to make sense to say "there are things that could produce > multiple outputs" if there's more than one format _and_ no trailers. Yeah. I was thinking of it as "is it OK to not de-dup", but of course it is the other way around because of the "!". And regardless of which way you write the conditional, the two sides must agree. ;) > The logic should read "there are things that could produce multiple > outputs if there is more than one format *or* at least one trailer". > > So I think the right change would be: > > --- >8 --- > diff --git a/builtin/shortlog.c b/builtin/shortlog.c > index 95ceab7649..7e1b56e2aa 100644 > --- a/builtin/shortlog.c > +++ b/builtin/shortlog.c > @@ -216,7 +216,8 @@ static void insert_records_from_format(struct shortlog *log, > > format_commit_message(commit, item->string, &buf, ctx); > > - if (strset_add(dups, buf.buf)) > + if (!(log->format.nr > 1 || log->trailers.nr) || > + strset_add(dups, buf.buf)) > insert_one_record(log, buf.buf, oneline); > } > --- 8< --- > > Yeah? Right. I wondered if it might be a little clearer to drop the outer "!", which yields: if ((log->format.nr <= 1 && !log->trailers.nr) || strset_add(dups, buf.buf)) but it is not really any less confusing. If we gave that first part of the conditional a name, like: if (!needs_dedup(log) || strset_add(dups, buf.buf)) maybe that is better. I dunno. Regardless, what you wrote above is correct. Thanks for catching it. -Peff
On Fri, Oct 21, 2022 at 01:03:40AM -0400, Jeff King wrote: > Right. I wondered if it might be a little clearer to drop the outer "!", > which yields: > > if ((log->format.nr <= 1 && !log->trailers.nr) || > strset_add(dups, buf.buf)) > > but it is not really any less confusing. If we gave that first part of > the conditional a name, like: > > if (!needs_dedup(log) || strset_add(dups, buf.buf)) > > maybe that is better. I dunno. Yeah, I think that this is best. I briefly wondered whether it would be better to invert the check _and_ name it, but `if (does_not_need_dedup(log))` is too wordy. > Regardless, what you wrote above is correct. Thanks for catching it. No problem, it was an honest mistake. Thanks, Taylor
diff --git a/builtin/log.c b/builtin/log.c index ee19dc5d45..6b77e520b5 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -1334,6 +1334,7 @@ static void make_cover_letter(struct rev_info *rev, int use_separate_file, log.in2 = 4; log.file = rev->diffopt.file; log.groups = SHORTLOG_GROUP_AUTHOR; + shortlog_init_group(&log); for (i = 0; i < nr; i++) shortlog_add_commit(&log, list[i]); diff --git a/builtin/shortlog.c b/builtin/shortlog.c index f708d96558..aac8c7afa4 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -245,15 +245,6 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) } oneline_str = oneline.len ? oneline.buf : "<none>"; - if (log->groups & SHORTLOG_GROUP_AUTHOR) { - strbuf_reset(&ident); - format_commit_message(commit, - log->email ? "%aN <%aE>" : "%aN", - &ident, &ctx); - if (!HAS_MULTI_BITS(log->groups) || - strset_add(&dups, ident.buf)) - insert_one_record(log, ident.buf, oneline_str); - } if (log->groups & SHORTLOG_GROUP_COMMITTER) { strbuf_reset(&ident); format_commit_message(commit, @@ -372,6 +363,16 @@ void shortlog_init(struct shortlog *log) log->format.strdup_strings = 1; } +void shortlog_init_group(struct shortlog *log) +{ + if (!log->groups) + log->groups = SHORTLOG_GROUP_AUTHOR; + + if (log->groups & SHORTLOG_GROUP_AUTHOR) + string_list_append(&log->format, + log->email ? "%aN <%aE>" : "%aN"); +} + int cmd_shortlog(int argc, const char **argv, const char *prefix) { struct shortlog log = { STRING_LIST_INIT_NODUP }; @@ -439,8 +440,8 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix) log.file = rev.diffopt.file; log.date_mode = rev.date_mode; - if (!log.groups) - log.groups = SHORTLOG_GROUP_AUTHOR; + shortlog_init_group(&log); + string_list_sort(&log.trailers); /* assume HEAD if from a tty */ diff --git a/shortlog.h b/shortlog.h index 4850a8c30f..e52f001fb7 100644 --- a/shortlog.h +++ b/shortlog.h @@ -33,6 +33,7 @@ struct shortlog { }; void shortlog_init(struct shortlog *log); +void shortlog_init_group(struct shortlog *log); void shortlog_add_commit(struct shortlog *log, struct commit *commit);
Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as a special case of the new `--group=<format>` mode, where the author mode is a shorthand for `--group='%aN <%aE>'. Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it has a different meaning in `read_from_stdin()`, where it is still used for a different purpose. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- builtin/log.c | 1 + builtin/shortlog.c | 23 ++++++++++++----------- shortlog.h | 1 + 3 files changed, 14 insertions(+), 11 deletions(-)