[v2,2/5] pretty: allow showing specific trailers
diff mbox series

Message ID 20181104152232.20671-3-anders@0x63.nu
State New
Headers show
Series
  • %(trailers) improvements in pretty format
Related show

Commit Message

Anders Waldenborg Nov. 4, 2018, 3:22 p.m. UTC
Adds a new "key=X" option to "%(trailers)" which will cause it to only
print trailers lines which matches the specified key.

Signed-off-by: Anders Waldenborg <anders@0x63.nu>
---
 Documentation/pretty-formats.txt | 13 +++++----
 pretty.c                         | 15 ++++++++++-
 t/t4205-log-pretty-formats.sh    | 45 ++++++++++++++++++++++++++++++++
 trailer.c                        |  8 +++---
 trailer.h                        |  1 +
 5 files changed, 73 insertions(+), 9 deletions(-)

Comments

Eric Sunshine Nov. 4, 2018, 6:14 p.m. UTC | #1
On Sun, Nov 4, 2018 at 10:24 AM Anders Waldenborg <anders@0x63.nu> wrote:
> Adds a new "key=X" option to "%(trailers)" which will cause it to only
> print trailers lines which matches the specified key.
>
> Signed-off-by: Anders Waldenborg <anders@0x63.nu>
> ---
> diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
> @@ -209,11 +209,14 @@ endif::git-rev-list[]
>  - %(trailers[:options]): display the trailers of the body as interpreted
>    by linkgit:git-interpret-trailers[1]. The `trailers` string may be
> +  followed by a colon and zero or more comma-separated options. The
> +  allowed options are `only` which omits non-trailer lines from the
> +  trailer block, `unfold` to make it behave as if interpret-trailer's
> +  `--unfold` option was given, and `key=T` to only show trailers with
> +  specified key (matching is done
> +  case-insensitively).

Does the user have to include the colon when specifying <val> of
'key=<val>'? I can see from peeking at the implementation that the
colon must not be used, but this should be documented. Should the code
tolerate a trailing colon? (Genuine question; it's easy to do and
would be more user-friendly.)

Does 'key=<val>', do a full or partial match on trailers? And, if
partial, is the match anchored at the start or can it match anywhere
in the trailer key? I see from the implementation that it does a full
match, but this behavior should be documented.

What happens if 'key=...' is specified multiple times? Are the
multiple keys conjunctive? Disjunctive? Last-wins? I can see from the
implementation that it is last-wins, but this behavior should be
documented. (I wonder how painful it will be for people who want to
match multiple keys. This doesn't have to be answered yet, as the
behavior can always be loosened later to allow multiple-key matching
since the current syntax does not disallow such expansion.)

Thinking further on the last two points, should <val> be a regular expression?

> +  shows all trailer lines, `%(trailers:key=Reviewed-by,unfold)`
> +  unfolds and shows trailer lines with key `Reviewed-by`.
> diff --git a/pretty.c b/pretty.c
> @@ -1323,7 +1323,19 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
> +                                       opts.filter_key = xstrndup(arg, end - arg);
> @@ -1331,6 +1343,7 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
>                         format_trailers_from_commit(sb, msg + c->subject_off, &opts);
>                 }
> +               free(opts.filter_key);

If I understand correctly, this is making a copy of <val> so that it
will be NUL-terminated since the code added to trailer.c uses a simple
strcasecmp() to match it. Would it make sense to avoid the copy by
adding fields 'opts.filter_key' and 'opts.filter_key_len' and using
strncasecmp() instead? (Genuine question; not necessarily a request
for change.)

> diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
> @@ -598,6 +598,51 @@ test_expect_success ':only and :unfold work together' '
> +test_expect_success 'pretty format %(trailers:key=foo) shows that trailer' '
> +       git log --no-walk --pretty="%(trailers:key=Acked-by)" >actual &&
> +       {
> +               echo "Acked-by: A U Thor <author@example.com>" &&
> +               echo
> +       } >expect &&
> +       test_cmp expect actual
> +'

I guess these new tests are modeled after one or two existing tests
which use a series of 'echo' statements, but an alternative would be:

    cat <<-\EOF >expect &&
    Acked-by: A U Thor <author@example.com>

    EOF

or, even:

    test_write_lines "Acked-by: A U Thor <author@example.com>" "" &&

though, that's probably less readable.
Junio C Hamano Nov. 5, 2018, 3:48 a.m. UTC | #2
Eric Sunshine <sunshine@sunshineco.com> writes:

> Does the user have to include the colon when specifying <val> of
> 'key=<val>'? I can see from peeking at the implementation that the
> colon must not be used, but this should be documented. Should the code
> tolerate a trailing colon? (Genuine question; it's easy to do and
> would be more user-friendly.)
>
> Does 'key=<val>', do a full or partial match on trailers? And, if
> partial, is the match anchored at the start or can it match anywhere
> in the trailer key? I see from the implementation that it does a full
> match, but this behavior should be documented.
>
> What happens if 'key=...' is specified multiple times? Are the
> multiple keys conjunctive? Disjunctive? Last-wins? I can see from the
> implementation that it is last-wins, but this behavior should be
> documented. (I wonder how painful it will be for people who want to
> match multiple keys. This doesn't have to be answered yet, as the
> behavior can always be loosened later to allow multiple-key matching
> since the current syntax does not disallow such expansion.)
>
> Thinking further on the last two points, should <val> be a regular expression?

Another thing that needs to be clarified in the document would be
case sensitivity.  People sometimes spell "Signed-Off-By:" by
mistake (or is it by malice?).

I do suspect that the parser should just make a list of sought-after
keys, not doing "last-one-wins", as that won't be very difficult to
do and makes what happens when given multiple keys trivially obvious.
Eric Sunshine Nov. 5, 2018, 3:52 a.m. UTC | #3
On Sun, Nov 4, 2018 at 10:48 PM Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> > Does the user have to include the colon when specifying <val> of
> > 'key=<val>'?
> > Does 'key=<val>', do a full or partial match on trailers?
> > What happens if 'key=...' is specified multiple times?
> > Thinking further on the last two points, should <val> be a regular expression?
>
> Another thing that needs to be clarified in the document would be
> case sensitivity.  People sometimes spell "Signed-Off-By:" by
> mistake (or is it by malice?).

The documentation does say parenthetically "(matching is done
case-insensitively)", so I think that's already covered. Or did you
have something else in mind?
Junio C Hamano Nov. 5, 2018, 5:14 a.m. UTC | #4
Anders Waldenborg <anders@0x63.nu> writes:

> +				else if (skip_prefix(arg, "key=", &arg)) {
> +					const char *end = arg + strcspn(arg, ",)");
> +
> +					if (opts.filter_key)
> +						free(opts.filter_key);

Call the free() unconditionally, cf. contrib/coccinelle/free.cocci.
Anders Waldenborg Nov. 5, 2018, 8:26 a.m. UTC | #5
Eric Sunshine writes:
> Should the code tolerate a trailing colon? (Genuine question; it's
> easy to do and would be more user-friendly.)

I would make sense to allow the trailing colon, it is easy enough to
just strip that away when reading the argument.

However I'm not sure how that would fit together with the possibility to
later lifting it to a regexp, hard to strip a trailing colon from a
regexp in a generic way.


> What happens if 'key=...' is specified multiple times?

My first thought was to simply disallow that. But that seemed hard to
fit into current model where errors just means don't expand.

I would guess that most useful and intuitive to user would be to handle
multiple key arguments by showing any of those keys.



> Thinking further on the last two points, should <val> be a regular expression?

It probably would make sense. I can see how the regexp '^.*-by$' would
be useful (but glob style matching would suffice in that case).

Also handling multi-matching with an alternation group would be elegant
%(trailers:key="(A|B)"). Except for the fact that the parser would need to
understand some kind of quoting, which seems like an major undertaking.

I guess having it as a regular exception would also mean that it needs
to get some way to cache the re so it is compiled once, and not for each expansion.

>
>> +               free(opts.filter_key);
>
> If I understand correctly, this is making a copy of <val> so that it
> will be NUL-terminated since the code added to trailer.c uses a simple
> strcasecmp() to match it. Would it make sense to avoid the copy by
> adding fields 'opts.filter_key' and 'opts.filter_key_len' and using
> strncasecmp() instead? (Genuine question; not necessarily a request
> for change.)

I'm also not very happy about that copy.

Just using strncasecmp would cause it to be prefix match, no?

But if changing matching semantics to handle multiple key options to
something else I'm thinking opts.filter_key would be replaced with a
opts.filter callback function, and that part would need to be rewritten
anyway.

>
>> diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
>> @@ -598,6 +598,51 @@ test_expect_success ':only and :unfold work together' '
>> +test_expect_success 'pretty format %(trailers:key=foo) shows that trailer' '
>> +       git log --no-walk --pretty="%(trailers:key=Acked-by)" >actual &&
>> +       {
>> +               echo "Acked-by: A U Thor <author@example.com>" &&
>> +               echo
>> +       } >expect &&
>> +       test_cmp expect actual
>> +'
>
> I guess these new tests are modeled after one or two existing tests
> which use a series of 'echo' statements

I guess I could change it to "--pretty=format:%(trailers:key=Acked-by)"
to get separator semantics and avoid that extra blank line, making it
simpler.
Eric Sunshine Nov. 5, 2018, 9 a.m. UTC | #6
On Mon, Nov 5, 2018 at 3:26 AM Anders Waldenborg <anders@0x63.nu> wrote:
> Eric Sunshine writes:
> > Should the code tolerate a trailing colon? (Genuine question; it's
> > easy to do and would be more user-friendly.)
>
> I would make sense to allow the trailing colon, it is easy enough to
> just strip that away when reading the argument.
>
> However I'm not sure how that would fit together with the possibility to
> later lifting it to a regexp, hard to strip a trailing colon from a
> regexp in a generic way.

Which is a good reason to think about these issues now, before being
set in stone.

> > What happens if 'key=...' is specified multiple times?
>
> My first thought was to simply disallow that. But that seemed hard to
> fit into current model where errors just means don't expand.
>
> I would guess that most useful and intuitive to user would be to handle
> multiple key arguments by showing any of those keys.

Agreed.

> > Thinking further on the last two points, should <val> be a regular expression?
>
> It probably would make sense. I can see how the regexp '^.*-by$' would
> be useful (but glob style matching would suffice in that case).
>
> Also handling multi-matching with an alternation group would be elegant
> %(trailers:key="(A|B)"). Except for the fact that the parser would need to
> understand some kind of quoting, which seems like an major undertaking.

Maybe, maybe not. As long as we're careful not to paint ourselves into
a corner, it might very well be okay to start with the current
implementation of matching the full key as a literal string and
(perhaps much) later introduce regex as an alternate way to specify
the key. For instance, 'key=literal' and 'key=/regex/' can co-exist,
and the extraction of the regex inside /.../ should not be especially
difficult.

> I guess having it as a regular exception would also mean that it needs
> to get some way to cache the re so it is compiled once, and not for each expansion.

Yes, that's something I brought up a few years ago during a GSoC
project; not regex specifically, but that this parsing of the format
is happening repeatedly rather than just once. I had suggested to the
GSoC student that the parsing could be done early, compiling the
format expression into a "machine" which could be applied repeatedly.
It's a larger job, of course, not necessarily something worth tackling
for your current needs.

> > If I understand correctly, this is making a copy of <val> so that it
> > will be NUL-terminated since the code added to trailer.c uses a simple
> > strcasecmp() to match it. Would it make sense to avoid the copy by
> > adding fields 'opts.filter_key' and 'opts.filter_key_len' and using
> > strncasecmp() instead? (Genuine question; not necessarily a request
> > for change.)
>
> I'm also not very happy about that copy.
> Just using strncasecmp would cause it to be prefix match, no?

Well, you could retain full key match by checking for NUL explicitly
with something like this:

    !strncasecmp(tok.buf, opts->filter_key, opts->filter_key_len) &&
        !tok.buf[opts->filter_key_len]

Patch
diff mbox series

diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index 417b638cd..8326fc45e 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -209,11 +209,14 @@  endif::git-rev-list[]
   respectively, but padding both sides (i.e. the text is centered)
 - %(trailers[:options]): display the trailers of the body as interpreted
   by linkgit:git-interpret-trailers[1]. The `trailers` string may be
-  followed by a colon and zero or more comma-separated options. If the
-  `only` option is given, omit non-trailer lines from the trailer block.
-  If the `unfold` option is given, behave as if interpret-trailer's
-  `--unfold` option was given.  E.g., `%(trailers:only,unfold)` to do
-  both.
+  followed by a colon and zero or more comma-separated options. The
+  allowed options are `only` which omits non-trailer lines from the
+  trailer block, `unfold` to make it behave as if interpret-trailer's
+  `--unfold` option was given, and `key=T` to only show trailers with
+  specified key (matching is done
+  case-insensitively). E.g. `%(trailers:only,unfold)` unfolds and
+  shows all trailer lines, `%(trailers:key=Reviewed-by,unfold)`
+  unfolds and shows trailer lines with key `Reviewed-by`.
 
 NOTE: Some placeholders may depend on other options given to the
 revision traversal engine. For example, the `%g*` reflog options will
diff --git a/pretty.c b/pretty.c
index aa03d5b23..cdca9dce2 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1323,7 +1323,19 @@  static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
 					opts.only_trailers = 1;
 				else if (match_placeholder_arg(arg, "unfold", &arg))
 					opts.unfold = 1;
-				else
+				else if (skip_prefix(arg, "key=", &arg)) {
+					const char *end = arg + strcspn(arg, ",)");
+
+					if (opts.filter_key)
+						free(opts.filter_key);
+
+					opts.filter_key = xstrndup(arg, end - arg);
+					arg = end;
+					if (*arg == ',')
+						arg++;
+
+					opts.only_trailers = 1;
+				} else
 					break;
 			}
 		}
@@ -1331,6 +1343,7 @@  static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
 			format_trailers_from_commit(sb, msg + c->subject_off, &opts);
 			ret = arg - placeholder + 1;
 		}
+		free(opts.filter_key);
 		return ret;
 	}
 
diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index 978a8a66f..0f5207242 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -598,6 +598,51 @@  test_expect_success ':only and :unfold work together' '
 	test_cmp expect actual
 '
 
+test_expect_success 'pretty format %(trailers:key=foo) shows that trailer' '
+	git log --no-walk --pretty="%(trailers:key=Acked-by)" >actual &&
+	{
+		echo "Acked-by: A U Thor <author@example.com>" &&
+		echo
+	} >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'pretty format %(trailers:key=foo) is case insensitive' '
+	git log --no-walk --pretty="%(trailers:key=AcKed-bY)" >actual &&
+	{
+		echo "Acked-by: A U Thor <author@example.com>" &&
+		echo
+	} >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success '%(trailers:key=nonexistant) becomes empty' '
+	git log --no-walk --pretty="x%(trailers:key=Nacked-by)x" >actual &&
+	{
+		echo "xx"
+	} >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success '%(trailers:key=foo) handles multiple lines even if folded' '
+	git log --no-walk --pretty="%(trailers:key=Signed-Off-by)" >actual &&
+	{
+		grep -v patch.description <trailers | grep -v Acked-by &&
+		echo
+	} >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success '%(trailers:key=foo,unfold) properly unfolds' '
+	git log --no-walk --pretty="%(trailers:key=Signed-Off-by,unfold)" >actual &&
+	{
+		echo "Signed-off-by: A U Thor <author@example.com>" &&
+		echo "Signed-off-by: A U Thor <author@example.com>" &&
+		echo
+	} >expect &&
+	test_cmp expect actual
+'
+
 test_expect_success 'trailer parsing not fooled by --- line' '
 	git commit --allow-empty -F - <<-\EOF &&
 	this is the subject
diff --git a/trailer.c b/trailer.c
index 0796f326b..cbbb553e4 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1147,10 +1147,12 @@  static void format_trailer_info(struct strbuf *out,
 			struct strbuf val = STRBUF_INIT;
 
 			parse_trailer(&tok, &val, NULL, trailer, separator_pos);
-			if (opts->unfold)
-				unfold_value(&val);
+			if (!opts->filter_key || !strcasecmp (tok.buf, opts->filter_key)) {
+				if (opts->unfold)
+					unfold_value(&val);
 
-			strbuf_addf(out, "%s: %s\n", tok.buf, val.buf);
+				strbuf_addf(out, "%s: %s\n", tok.buf, val.buf);
+			}
 			strbuf_release(&tok);
 			strbuf_release(&val);
 
diff --git a/trailer.h b/trailer.h
index b99773964..d052d02ae 100644
--- a/trailer.h
+++ b/trailer.h
@@ -72,6 +72,7 @@  struct process_trailer_options {
 	int only_input;
 	int unfold;
 	int no_divider;
+	char *filter_key;
 };
 
 #define PROCESS_TRAILER_OPTIONS_INIT {0}