diff mbox series

[v6,2/2] pretty: colorize pattern matches in commit messages

Message ID 20210921003050.641393-2-someguy@effective-light.com (mailing list archive)
State Superseded
Headers show
Series [v6,1/2] grep: refactor next_match() and match_one_pattern() for external use | expand

Commit Message

Hamza Mahfooz Sept. 21, 2021, 12:30 a.m. UTC
The "git log" command limits its output to the commits that contain strings
matched by a pattern when the "--grep=<pattern>" option is used, but unlike
output from "git grep -e <pattern>", the matches are not highlighted,
making them harder to spot.

Teach the pretty-printer code to highlight matches from the
"--grep=<pattern>", "--author=<pattern>" and "--committer=<pattern>"
options (to view the last one, you may have to ask for --pretty=fuller).

Also, it must be noted that we are effectively greping the content twice,
however it only slows down "git log --author=^H" on this repository by
around 1-2% (compared to v2.33.0), so it should be a small enough slow
down to justify the addition of the feature.

Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
---
v2: make the commit message whole (add the missing ingredients), rename
    append_matched_line() to append_line_with_color(), use
    colors[GREP_COLOR_MATCH_SELECTED] instead of
    colors[GREP_COLOR_MATCH_CONTEXT], allow the background color to be
    customized, don't copy strings to a buffer when not coloring in
    append_line_with_color(), rename next_match() to grep_next_match(),
    repurpose grep_next_match()/match_one_pattern() for use in
    append_line_with_color() (allowing us to remove duplicated matching
    code in append_line_with_color()), document how to customize the
    feature and modify some of the tests to fit the feature better.

v3: fix a formatting issue with the added documentation.

v4: add strbuf_add_with_color(), use the correct color code scheme in the
    unit tests and add more unit tests.

v5: separate grep changes from pretty changes and add some performance
    analysis in the commit message.

v6: put the documentation in the correct place, cleanup pretty.c and
    format the unit tests according to the current convention.
---
 Documentation/config/color.txt |   7 ++-
 pretty.c                       | 107 +++++++++++++++++++++++++++++----
 t/t4202-log.sh                 |  51 ++++++++++++++++
 3 files changed, 151 insertions(+), 14 deletions(-)

Comments

Jeff King Sept. 21, 2021, 1:24 a.m. UTC | #1
On Mon, Sep 20, 2021 at 08:30:50PM -0400, Hamza Mahfooz wrote:

> Teach the pretty-printer code to highlight matches from the
> "--grep=<pattern>", "--author=<pattern>" and "--committer=<pattern>"
> options (to view the last one, you may have to ask for --pretty=fuller).
> 
> Also, it must be noted that we are effectively greping the content twice,
> however it only slows down "git log --author=^H" on this repository by
> around 1-2% (compared to v2.33.0), so it should be a small enough slow
> down to justify the addition of the feature.

This might or might not be related, but one thing I noticed is that your
earlier patch causes us to grep a lot more lines than we mean to (even
if we are looking for "author" lines, it greps every header line). That
might contribute to the slowdown. Likewise, it calls strip_timestamp()
on every line, even if it does not start with "author").

> +static inline void strbuf_add_with_color(struct strbuf *sb, const char *color,
> +					 char *buf, size_t buflen)
> +{
> +	strbuf_addstr(sb, color);
> +	strbuf_add(sb, buf, buflen);
> +	if (*color)
> +		strbuf_addstr(sb, GIT_COLOR_RESET);
> +}

You could take "buf" as a "const char *" here. That doesn't matter too
much for now, but see below.

> +static void append_line_with_color(struct strbuf *sb, struct grep_opt *opt,
> +				   const char *line, size_t linelen,
> +				   int color, enum grep_context ctx,
> +				   enum grep_header_field field)
> +{
> +	char *buf, *eol;
> +	const char *line_color, *match_color;
> +	regmatch_t match;
> +	int eflags = 0;
> +
> +	if (!opt || !want_color(color) || opt->invert) {
> +		strbuf_add(sb, line, linelen);
> +		return;
> +	}
> +
> +	buf = (char *)line;
> +	eol = buf + linelen;

OK, so we got rid of the copy of "line", which is nice. But we are
casting away const-ness, which is a potential red flag (is somebody
going to modify this string, even though we promised our caller we would
not?). We'd probably want a comment to explain why we are doing so, and
why it is OK (e.g., if somebody in the call stack modifies it
temporarily).

More on this in a moment.

> +	while (grep_next_match(opt, buf, eol, ctx, &match, field, eflags)) {
> +		if (match.rm_so == match.rm_eo)
> +			break;
> +
> +		strbuf_grow(sb, strlen(line_color) + strlen(match_color) +
> +			    (2 * strlen(GIT_COLOR_RESET)));
> +		strbuf_add_with_color(sb, line_color, buf, match.rm_so);
> +		strbuf_add_with_color(sb, match_color, buf + match.rm_so,
> +				      match.rm_eo - match.rm_so);

As Eric mentioned, these strbuf_grow() calls can go away. The whole
point of strbuf is that we do not have to clutter the code with manual
size computations, because it will do the right thing automatically.

Sometimes you can get extra performance by pre-sizing the strbuf, but:

  1. I'd be surprised if we did in this case. We're writing into a
     strbuf that will receive the whole per-commit output, so any growth
     cost due to a couple of short strings here would be amortized
     anyway.

  2. The computation here doesn't represent the needed growth anyway.
     When we call strbuf_add_with_color(), it's going to add not just
     the colors but all of the data for the line itself.

So at best it's doing nothing, and at worst it is making the code harder
to understand.

> +	if (eflags) {
> +		strbuf_grow(sb, strlen(line_color) + strlen(GIT_COLOR_RESET));
> +		strbuf_add_with_color(sb, line_color, buf, eol - buf);
> +	} else
> +		strbuf_add(sb, buf, eol - buf);
> +}

Ditto here (we grow for the colors, but also end up adding "eol - buf"
bytes).

-Peff
Jeff King Sept. 21, 2021, 1:39 a.m. UTC | #2
On Mon, Sep 20, 2021 at 09:24:08PM -0400, Jeff King wrote:

> > +	buf = (char *)line;
> > +	eol = buf + linelen;
> 
> OK, so we got rid of the copy of "line", which is nice. But we are
> casting away const-ness, which is a potential red flag (is somebody
> going to modify this string, even though we promised our caller we would
> not?). We'd probably want a comment to explain why we are doing so, and
> why it is OK (e.g., if somebody in the call stack modifies it
> temporarily).
> 
> More on this in a moment.

The root of the issue is that grep_next_match() takes a non-const
buffer, and so on. And indeed, it _does_ eventually get modified,
although only temporarily. I think we can clean that up, though.

Here are two patches I prepared on top of your series to show what's
possible, though I think we should do one of:

  - put them at the front of your series (with the appropriate
    adjustments) as preparatory cleanup

  - keep them separate. You can put a comment above the cast to mention
    what's going on and why it's OK for now, and then later when they're
    both merged, we can remove that cast.

The second option creates a little extra work for the maintainer (they
both touch match_one_patter(), so there will be some textual conflicts).
But it does mean we avoid a dependencies; the cleanups don't derail your
series, nor does your series hold up the cleanups. So I could go either
way.

  [1/2]: grep: stop modifying buffer in strip_timestamp
  [2/2]: grep: mark "haystack" buffers as const

 grep.c   | 30 ++++++++++++------------------
 grep.h   |  3 ++-
 pretty.c |  6 +++---
 3 files changed, 17 insertions(+), 22 deletions(-)

-Peff
Hamza Mahfooz Sept. 21, 2021, 2:38 a.m. UTC | #3
On Mon, Sep 20 2021 at 09:39:27 PM -0400, Jeff King <peff@peff.net> 
wrote:
> Here are two patches I prepared on top of your series to show what's
> possible, though I think we should do one of:
> 
>   - put them at the front of your series (with the appropriate
>     adjustments) as preparatory cleanup
> 
>   - keep them separate. You can put a comment above the cast to 
> mention
>     what's going on and why it's OK for now, and then later when 
> they're
>     both merged, we can remove that cast.

Option 1 is preferable from my perspective, in that case.
Jeff King Sept. 21, 2021, 3:15 a.m. UTC | #4
On Mon, Sep 20, 2021 at 10:38:10PM -0400, Hamza Mahfooz wrote:

> 
> On Mon, Sep 20 2021 at 09:39:27 PM -0400, Jeff King <peff@peff.net> wrote:
> > Here are two patches I prepared on top of your series to show what's
> > possible, though I think we should do one of:
> > 
> >   - put them at the front of your series (with the appropriate
> >     adjustments) as preparatory cleanup
> > 
> >   - keep them separate. You can put a comment above the cast to mention
> >     what's going on and why it's OK for now, and then later when they're
> >     both merged, we can remove that cast.
> 
> Option 1 is preferable from my perspective, in that case.

OK. The patches I showed were the minimum to get your series working,
but there's actually a bit more cleanup we can do. I'll post a new
series in a moment, and then you can build on top of that.

-Peff
diff mbox series

Patch

diff --git a/Documentation/config/color.txt b/Documentation/config/color.txt
index e05d520a86..91d9a9da32 100644
--- a/Documentation/config/color.txt
+++ b/Documentation/config/color.txt
@@ -104,9 +104,12 @@  color.grep.<slot>::
 `matchContext`;;
 	matching text in context lines
 `matchSelected`;;
-	matching text in selected lines
+	matching text in selected lines. Also, used to customize the following
+	linkgit:git-log[1] subcommands: `--grep`, `--author` and `--committer`.
 `selected`;;
-	non-matching text in selected lines
+	non-matching text in selected lines. Also, used to customize the
+	following linkgit:git-log[1] subcommands: `--grep`, `--author` and
+	`--committer`.
 `separator`;;
 	separators between fields on a line (`:`, `-`, and `=`)
 	and between hunks (`--`)
diff --git a/pretty.c b/pretty.c
index 73b5ead509..943a2d2ee2 100644
--- a/pretty.c
+++ b/pretty.c
@@ -431,6 +431,56 @@  const char *show_ident_date(const struct ident_split *ident,
 	return show_date(date, tz, mode);
 }
 
+static inline void strbuf_add_with_color(struct strbuf *sb, const char *color,
+					 char *buf, size_t buflen)
+{
+	strbuf_addstr(sb, color);
+	strbuf_add(sb, buf, buflen);
+	if (*color)
+		strbuf_addstr(sb, GIT_COLOR_RESET);
+}
+
+static void append_line_with_color(struct strbuf *sb, struct grep_opt *opt,
+				   const char *line, size_t linelen,
+				   int color, enum grep_context ctx,
+				   enum grep_header_field field)
+{
+	char *buf, *eol;
+	const char *line_color, *match_color;
+	regmatch_t match;
+	int eflags = 0;
+
+	if (!opt || !want_color(color) || opt->invert) {
+		strbuf_add(sb, line, linelen);
+		return;
+	}
+
+	buf = (char *)line;
+	eol = buf + linelen;
+
+	line_color = opt->colors[GREP_COLOR_SELECTED];
+	match_color = opt->colors[GREP_COLOR_MATCH_SELECTED];
+
+	while (grep_next_match(opt, buf, eol, ctx, &match, field, eflags)) {
+		if (match.rm_so == match.rm_eo)
+			break;
+
+		strbuf_grow(sb, strlen(line_color) + strlen(match_color) +
+			    (2 * strlen(GIT_COLOR_RESET)));
+		strbuf_add_with_color(sb, line_color, buf, match.rm_so);
+		strbuf_add_with_color(sb, match_color, buf + match.rm_so,
+				      match.rm_eo - match.rm_so);
+		buf += match.rm_eo;
+		eflags = REG_NOTBOL;
+	}
+
+	if (eflags) {
+		strbuf_grow(sb, strlen(line_color) + strlen(GIT_COLOR_RESET));
+		strbuf_add_with_color(sb, line_color, buf, eol - buf);
+	} else
+		strbuf_add(sb, buf, eol - buf);
+}
+
 void pp_user_info(struct pretty_print_context *pp,
 		  const char *what, struct strbuf *sb,
 		  const char *line, const char *encoding)
@@ -496,9 +546,28 @@  void pp_user_info(struct pretty_print_context *pp,
 			strbuf_addch(sb, '\n');
 		strbuf_addf(sb, " <%.*s>\n", (int)maillen, mailbuf);
 	} else {
-		strbuf_addf(sb, "%s: %.*s%.*s <%.*s>\n", what,
-			    (pp->fmt == CMIT_FMT_FULLER) ? 4 : 0, "    ",
-			    (int)namelen, namebuf, (int)maillen, mailbuf);
+		struct strbuf id;
+		enum grep_header_field field = GREP_HEADER_FIELD_MAX;
+		struct grep_opt *opt = pp->rev ? &pp->rev->grep_filter : NULL;
+
+		strbuf_init(&id, namelen + maillen + 4);
+
+		if (!strcmp(what, "Author"))
+			field = GREP_HEADER_AUTHOR;
+		else if (!strcmp(what, "Commit"))
+			field = GREP_HEADER_COMMITTER;
+
+		strbuf_addf(sb, "%s: ", what);
+		if (pp->fmt == CMIT_FMT_FULLER)
+			strbuf_addchars(sb, ' ', 4);
+
+		strbuf_addf(&id, "%.*s <%.*s>", (int)namelen, namebuf,
+			    (int)maillen, mailbuf);
+
+		append_line_with_color(sb, opt, id.buf, id.len, pp->color,
+				       GREP_CONTEXT_HEAD, field);
+		strbuf_addch(sb, '\n');
+		strbuf_release(&id);
 	}
 
 	switch (pp->fmt) {
@@ -1939,8 +2008,9 @@  static int pp_utf8_width(const char *start, const char *end)
 	return width;
 }
 
-static void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth,
-				 const char *line, int linelen)
+static void strbuf_add_tabexpand(struct strbuf *sb, struct grep_opt *opt,
+				 int color, int tabwidth, const char *line,
+				 int linelen)
 {
 	const char *tab;
 
@@ -1957,7 +2027,9 @@  static void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth,
 			break;
 
 		/* Output the data .. */
-		strbuf_add(sb, line, tab - line);
+		append_line_with_color(sb, opt, line, tab - line, color,
+				       GREP_CONTEXT_BODY,
+				       GREP_HEADER_FIELD_MAX);
 
 		/* .. and the de-tabified tab */
 		strbuf_addchars(sb, ' ', tabwidth - (width % tabwidth));
@@ -1972,7 +2044,8 @@  static void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth,
 	 * worrying about width - there's nothing more to
 	 * align.
 	 */
-	strbuf_add(sb, line, linelen);
+	append_line_with_color(sb, opt, line, linelen, color, GREP_CONTEXT_BODY,
+			       GREP_HEADER_FIELD_MAX);
 }
 
 /*
@@ -1984,11 +2057,16 @@  static void pp_handle_indent(struct pretty_print_context *pp,
 			     struct strbuf *sb, int indent,
 			     const char *line, int linelen)
 {
+	struct grep_opt *opt = pp->rev ? &pp->rev->grep_filter : NULL;
+
 	strbuf_addchars(sb, ' ', indent);
 	if (pp->expand_tabs_in_log)
-		strbuf_add_tabexpand(sb, pp->expand_tabs_in_log, line, linelen);
+		strbuf_add_tabexpand(sb, opt, pp->color, pp->expand_tabs_in_log,
+				     line, linelen);
 	else
-		strbuf_add(sb, line, linelen);
+		append_line_with_color(sb, opt, line, linelen, pp->color,
+				       GREP_CONTEXT_BODY,
+				       GREP_HEADER_FIELD_MAX);
 }
 
 static int is_mboxrd_from(const char *line, int len)
@@ -2006,7 +2084,9 @@  void pp_remainder(struct pretty_print_context *pp,
 		  struct strbuf *sb,
 		  int indent)
 {
+	struct grep_opt *opt = pp->rev ? &pp->rev->grep_filter : NULL;
 	int first = 1;
+
 	for (;;) {
 		const char *line = *msg_p;
 		int linelen = get_one_line(line);
@@ -2027,14 +2107,17 @@  void pp_remainder(struct pretty_print_context *pp,
 		if (indent)
 			pp_handle_indent(pp, sb, indent, line, linelen);
 		else if (pp->expand_tabs_in_log)
-			strbuf_add_tabexpand(sb, pp->expand_tabs_in_log,
-					     line, linelen);
+			strbuf_add_tabexpand(sb, opt, pp->color,
+					     pp->expand_tabs_in_log, line,
+					     linelen);
 		else {
 			if (pp->fmt == CMIT_FMT_MBOXRD &&
 					is_mboxrd_from(line, linelen))
 				strbuf_addch(sb, '>');
 
-			strbuf_add(sb, line, linelen);
+			append_line_with_color(sb, opt, line, linelen,
+					       pp->color, GREP_CONTEXT_BODY,
+					       GREP_HEADER_FIELD_MAX);
 		}
 		strbuf_addch(sb, '\n');
 	}
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 9dfead936b..3d240bba57 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -449,6 +449,57 @@  test_expect_success !FAIL_PREREQS 'log with various grep.patternType configurati
 	)
 '
 
+test_expect_success 'log --author' '
+	cat >expect <<-\EOF &&
+	Author: <BOLD;RED>A U<RESET> Thor <author@example.com>
+	EOF
+	git log -1 --color=always --author="A U" >log &&
+	grep Author log >actual.raw &&
+	test_decode_color <actual.raw >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'log --committer' '
+	cat >expect <<-\EOF &&
+	Commit:     C O Mitter <committer@<BOLD;RED>example<RESET>.com>
+	EOF
+	git log -1 --color=always --pretty=fuller --committer="example" >log &&
+	grep "Commit:" log >actual.raw &&
+	test_decode_color <actual.raw >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'log -i --grep with color' '
+	cat >expect <<-\EOF &&
+	    <BOLD;RED>Sec<RESET>ond
+	    <BOLD;RED>sec<RESET>ond
+	EOF
+	git log --color=always -i --grep=^sec >log &&
+	grep -i sec log >actual.raw &&
+	test_decode_color <actual.raw >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '-c color.grep.selected log --grep' '
+	cat >expect <<-\EOF &&
+	    <GREEN>th<RESET><BOLD;RED>ir<RESET><GREEN>d<RESET>
+	EOF
+	git -c color.grep.selected="green" log --color=always --grep=ir >log &&
+	grep ir log >actual.raw &&
+	test_decode_color <actual.raw >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '-c color.grep.matchSelected log --grep' '
+	cat >expect <<-\EOF &&
+	    <BLUE>i<RESET>n<BLUE>i<RESET>t<BLUE>i<RESET>al
+	EOF
+	git -c color.grep.matchSelected="blue" log --color=always --grep=i >log &&
+	grep al log >actual.raw &&
+	test_decode_color <actual.raw >actual &&
+	test_cmp expect actual
+'
+
 cat > expect <<EOF
 * Second
 * sixth