diff mbox series

[v3,2/5] strbuf: refactor strbuf_trim_trailing_ch()

Message ID 20241206124248.160494-3-christian.couder@gmail.com (mailing list archive)
State New
Headers show
Series Introduce a "promisor-remote" capability | expand

Commit Message

Christian Couder Dec. 6, 2024, 12:42 p.m. UTC
We often have to split strings at some specified terminator character.
The strbuf_split*() functions, that we can use for this purpose,
return substrings that include the terminator character, so we often
need to remove that character.

When it is a whitespace, newline or directory separator, the
terminator character can easily be removed using an existing triming
function like strbuf_rtrim(), strbuf_trim_trailing_newline() or
strbuf_trim_trailing_dir_sep(). There is no function to remove that
character when it's not one of those characters though.

Let's introduce a new strbuf_trim_trailing_ch() function that can be
used to remove any trailing character, and let's refactor existing code
that manually removed trailing characters using this new function.

We are also going to use this new function in a following commit.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 strbuf.c         |  7 +++++++
 strbuf.h         |  3 +++
 trace2/tr2_cfg.c | 10 ++--------
 3 files changed, 12 insertions(+), 8 deletions(-)

Comments

Junio C Hamano Dec. 7, 2024, 6:35 a.m. UTC | #1
Christian Couder <christian.couder@gmail.com> writes:

> We often have to split strings at some specified terminator character.
> The strbuf_split*() functions, that we can use for this purpose,
> return substrings that include the terminator character, so we often
> need to remove that character.
>
> When it is a whitespace, newline or directory separator, the
> terminator character can easily be removed using an existing triming
> function like strbuf_rtrim(), strbuf_trim_trailing_newline() or
> strbuf_trim_trailing_dir_sep(). There is no function to remove that
> character when it's not one of those characters though.

Heh, totally uninteresting (alternative being open coding this one).
If we pass, instead of a single character 'c', an array of characters
to be stripped from the right (like strspn() allows you to skip from
the left), I may have been a bit more receptive, though ;-)

> +void strbuf_trim_trailing_ch(struct strbuf *sb, int c)
> +{
> +	while (sb->len > 0 && sb->buf[sb->len - 1] == c)
> +		sb->len--;
> +	sb->buf[sb->len] = '\0';
> +}

So, trim_trailing will leave "foo" when "foo,,," is fed with c set
to ','.

> diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c
> index 22a99a0682..9da1f8466c 100644
> --- a/trace2/tr2_cfg.c
> +++ b/trace2/tr2_cfg.c
> @@ -35,10 +35,7 @@ static int tr2_cfg_load_patterns(void)
>  
>  	tr2_cfg_patterns = strbuf_split_buf(envvar, strlen(envvar), ',', -1);
>  	for (s = tr2_cfg_patterns; *s; s++) {
> -		struct strbuf *buf = *s;
> -
> -		if (buf->len && buf->buf[buf->len - 1] == ',')
> -			strbuf_setlen(buf, buf->len - 1);
> +		strbuf_trim_trailing_ch(*s, ',');

And the only thing that prevents this rewrite from being buggy is
the use of misdesigned strbuf_split_buf() function (which by now we
should have deprecated!).  Because it splits at ',', we won't have
more than one ',' trailing, but we still split that one trailing
comma because the misdesigned strbuf_split_buf() leaves the
separator at the end of each element.

This does not look like a very convincing example to demonstrate why
the new helper function is useful, at least to me.  

If somebody would touch this area of code, I think a lot nicer
clean-up would be to rewrite the thing into a helper function that
is called from here, and the other one in the next hunk in a single
patch, and then clean up the refactored helper function not to use
the strbuf_split_buf().  Looking at the way tr2_cfg_patterns and
tr2_cfg_env_vars are used, they have *NO* valid reason why they have
to be a strbuf.  Once populated, they are only used for a constant
string pointed at by their .buf member.  A string_list constructed
by appending (i.e. not sorted) would be a lot more suitable data
structure.

>  		strbuf_trim_trailing_newline(*s);
>  		strbuf_trim(*s);
>  	}
> @@ -74,10 +71,7 @@ static int tr2_load_env_vars(void)
>  
>  	tr2_cfg_env_vars = strbuf_split_buf(varlist, strlen(varlist), ',', -1);
>  	for (s = tr2_cfg_env_vars; *s; s++) {
> -		struct strbuf *buf = *s;
> -
> -		if (buf->len && buf->buf[buf->len - 1] == ',')
> -			strbuf_setlen(buf, buf->len - 1);
> +		strbuf_trim_trailing_ch(*s, ',');
>  		strbuf_trim_trailing_newline(*s);
>  		strbuf_trim(*s);
>  	}
Karthik Nayak Dec. 16, 2024, 11:47 a.m. UTC | #2
Christian Couder <christian.couder@gmail.com> writes:

> We often have to split strings at some specified terminator character.
> The strbuf_split*() functions, that we can use for this purpose,
> return substrings that include the terminator character, so we often
> need to remove that character.
>
> When it is a whitespace, newline or directory separator, the
> terminator character can easily be removed using an existing triming

Nit: s/triming/trimming

> function like strbuf_rtrim(), strbuf_trim_trailing_newline() or
> strbuf_trim_trailing_dir_sep(). There is no function to remove that
> character when it's not one of those characters though.
>
> Let's introduce a new strbuf_trim_trailing_ch() function that can be
> used to remove any trailing character, and let's refactor existing code
> that manually removed trailing characters using this new function.
>
> We are also going to use this new function in a following commit.
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  strbuf.c         |  7 +++++++
>  strbuf.h         |  3 +++
>  trace2/tr2_cfg.c | 10 ++--------
>  3 files changed, 12 insertions(+), 8 deletions(-)
>

Shouldn't this patch also add unit tests? We already have some in
't/unit-tests/t-strbuf.c'. This applies to the previous patch too.

[snip]
diff mbox series

Patch

diff --git a/strbuf.c b/strbuf.c
index cccfdec0e3..c986ec28f4 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -134,6 +134,13 @@  void strbuf_trim_trailing_dir_sep(struct strbuf *sb)
 	sb->buf[sb->len] = '\0';
 }
 
+void strbuf_trim_trailing_ch(struct strbuf *sb, int c)
+{
+	while (sb->len > 0 && sb->buf[sb->len - 1] == c)
+		sb->len--;
+	sb->buf[sb->len] = '\0';
+}
+
 void strbuf_trim_trailing_newline(struct strbuf *sb)
 {
 	if (sb->len > 0 && sb->buf[sb->len - 1] == '\n') {
diff --git a/strbuf.h b/strbuf.h
index 884157873e..5e389ab065 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -197,6 +197,9 @@  void strbuf_trim_trailing_dir_sep(struct strbuf *sb);
 /* Strip trailing LF or CR/LF */
 void strbuf_trim_trailing_newline(struct strbuf *sb);
 
+/* Strip trailing character c */
+void strbuf_trim_trailing_ch(struct strbuf *sb, int c);
+
 /**
  * Replace the contents of the strbuf with a reencoded form.  Returns -1
  * on error, 0 on success.
diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c
index 22a99a0682..9da1f8466c 100644
--- a/trace2/tr2_cfg.c
+++ b/trace2/tr2_cfg.c
@@ -35,10 +35,7 @@  static int tr2_cfg_load_patterns(void)
 
 	tr2_cfg_patterns = strbuf_split_buf(envvar, strlen(envvar), ',', -1);
 	for (s = tr2_cfg_patterns; *s; s++) {
-		struct strbuf *buf = *s;
-
-		if (buf->len && buf->buf[buf->len - 1] == ',')
-			strbuf_setlen(buf, buf->len - 1);
+		strbuf_trim_trailing_ch(*s, ',');
 		strbuf_trim_trailing_newline(*s);
 		strbuf_trim(*s);
 	}
@@ -74,10 +71,7 @@  static int tr2_load_env_vars(void)
 
 	tr2_cfg_env_vars = strbuf_split_buf(varlist, strlen(varlist), ',', -1);
 	for (s = tr2_cfg_env_vars; *s; s++) {
-		struct strbuf *buf = *s;
-
-		if (buf->len && buf->buf[buf->len - 1] == ',')
-			strbuf_setlen(buf, buf->len - 1);
+		strbuf_trim_trailing_ch(*s, ',');
 		strbuf_trim_trailing_newline(*s);
 		strbuf_trim(*s);
 	}