column: use utf8_strnwidth() to strip out ANSI color escapes
diff mbox series

Message ID 9b3f6960-ea75-c3a7-3a24-0554320bb359@web.de
State New
Headers show
Series
  • column: use utf8_strnwidth() to strip out ANSI color escapes
Related show

Commit Message

René Scharfe Oct. 13, 2019, 12:49 p.m. UTC
Make use of utf8_strnwidth()'s feature to skip ANSI escape sequences
instead of open-coding it.  This shortens the code and makes it more
consistent.

This changes the behavior, though: The old code skips all kinds of
Control Sequence Introducer sequences, while utf8_strnwidth() only skips
the Select Graphic Rendition kind, i.e. those ending with "m".  They are
used for specifying color and font attributes like boldness.  The only
other kind of escape sequence we print in Git is Erase in Line, ending
with "K".  That's not used for columnar output, so this difference
actually doesn't matter here.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 column.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

--
2.23.0

Comments

Johannes Schindelin Oct. 14, 2019, 11:13 a.m. UTC | #1
Hi René,

On Sun, 13 Oct 2019, René Scharfe wrote:

> Make use of utf8_strnwidth()'s feature to skip ANSI escape sequences
> instead of open-coding it.  This shortens the code and makes it more
> consistent.

Sounds good.

> This changes the behavior, though: The old code skips all kinds of
> Control Sequence Introducer sequences, while utf8_strnwidth() only skips
> the Select Graphic Rendition kind, i.e. those ending with "m".  They are
> used for specifying color and font attributes like boldness.  The only
> other kind of escape sequence we print in Git is Erase in Line, ending
> with "K".  That's not used for columnar output, so this difference
> actually doesn't matter here.

Arguably, the "Erase in Line" thing should re-set the width to 0, no?
But as you say, this is not needed for this patch.

Thanks,
Dscho

>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  column.c | 13 +------------
>  1 file changed, 1 insertion(+), 12 deletions(-)
>
> diff --git a/column.c b/column.c
> index 7a17c14b82..4a38eed322 100644
> --- a/column.c
> +++ b/column.c
> @@ -23,18 +23,7 @@ struct column_data {
>  /* return length of 's' in letters, ANSI escapes stripped */
>  static int item_length(const char *s)
>  {
> -	int len, i = 0;
> -	struct strbuf str = STRBUF_INIT;
> -
> -	strbuf_addstr(&str, s);
> -	while ((s = strstr(str.buf + i, "\033[")) != NULL) {
> -		int len = strspn(s + 2, "0123456789;");
> -		i = s - str.buf;
> -		strbuf_remove(&str, i, len + 3); /* \033[<len><func char> */
> -	}
> -	len = utf8_strwidth(str.buf);
> -	strbuf_release(&str);
> -	return len;
> +	return utf8_strnwidth(s, -1, 1);
>  }
>
>  /*
> --
> 2.23.0
>
René Scharfe Oct. 14, 2019, 2:16 p.m. UTC | #2
Am 14.10.19 um 13:13 schrieb Johannes Schindelin:
> Hi René,
>
> On Sun, 13 Oct 2019, René Scharfe wrote:
>
>> This changes the behavior, though: The old code skips all kinds of
>> Control Sequence Introducer sequences, while utf8_strnwidth() only skips
>> the Select Graphic Rendition kind, i.e. those ending with "m".  They are
>> used for specifying color and font attributes like boldness.  The only
>> other kind of escape sequence we print in Git is Erase in Line, ending
>> with "K".  That's not used for columnar output, so this difference
>> actually doesn't matter here.
>
> Arguably, the "Erase in Line" thing should re-set the width to 0, no?
> But as you say, this is not needed for this patch.

It doesn't move the cursor, just clears the characters to the right, to
the left or both sides, depending on its parameter.  So ignoring it for
width calculation like	the old code did would be appropriate -- if we'd
encounter such an escape sequence in text to be shown in columns.

René
Johannes Schindelin Oct. 14, 2019, 7:33 p.m. UTC | #3
Hi René,

On Mon, 14 Oct 2019, René Scharfe wrote:

> Am 14.10.19 um 13:13 schrieb Johannes Schindelin:
>
> > On Sun, 13 Oct 2019, René Scharfe wrote:
> >
> >> This changes the behavior, though: The old code skips all kinds of
> >> Control Sequence Introducer sequences, while utf8_strnwidth() only skips
> >> the Select Graphic Rendition kind, i.e. those ending with "m".  They are
> >> used for specifying color and font attributes like boldness.  The only
> >> other kind of escape sequence we print in Git is Erase in Line, ending
> >> with "K".  That's not used for columnar output, so this difference
> >> actually doesn't matter here.
> >
> > Arguably, the "Erase in Line" thing should re-set the width to 0, no?
> > But as you say, this is not needed for this patch.
>
> It doesn't move the cursor, just clears the characters to the right, to
> the left or both sides, depending on its parameter.  So ignoring it for
> width calculation like	the old code did would be appropriate -- if we'd
> encounter such an escape sequence in text to be shown in columns.

Whoops, you're right. I brainfarted, mistaking it for `\r`... My bad!

Ciao,
Dscho

Patch
diff mbox series

diff --git a/column.c b/column.c
index 7a17c14b82..4a38eed322 100644
--- a/column.c
+++ b/column.c
@@ -23,18 +23,7 @@  struct column_data {
 /* return length of 's' in letters, ANSI escapes stripped */
 static int item_length(const char *s)
 {
-	int len, i = 0;
-	struct strbuf str = STRBUF_INIT;
-
-	strbuf_addstr(&str, s);
-	while ((s = strstr(str.buf + i, "\033[")) != NULL) {
-		int len = strspn(s + 2, "0123456789;");
-		i = s - str.buf;
-		strbuf_remove(&str, i, len + 3); /* \033[<len><func char> */
-	}
-	len = utf8_strwidth(str.buf);
-	strbuf_release(&str);
-	return len;
+	return utf8_strnwidth(s, -1, 1);
 }

 /*