diff mbox series

[2/4] strbuf: refactor strbuf_trim_trailing_ch()

Message ID 20240731134014.2299361-3-christian.couder@gmail.com (mailing list archive)
State New, archived
Headers show
Series Introduce a "promisor-remote" capability | expand

Commit Message

Christian Couder July 31, 2024, 1:40 p.m. UTC
We often have to split strings at some specified terminator character.
The strbuf_split*() functions, that we can use for this purpose,
return substrings that include the terminator character, so we often
need to remove that character.

When it is a whitespace, newline or directory separator, the
terminator character can easily be removed using an existing triming
function like strbuf_rtrim(), strbuf_trim_trailing_newline() or
strbuf_trim_trailing_dir_sep(). There is no function to remove that
character when it's not one of those characters though.

Let's introduce a new strbuf_trim_trailing_ch() function that can be
used to remove any trailing character, and let's refactor existing code
that manually removed trailing characters using this new function.

We are also going to use this new function in a following commit.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 strbuf.c         |  7 +++++++
 strbuf.h         |  3 +++
 trace2/tr2_cfg.c | 10 ++--------
 3 files changed, 12 insertions(+), 8 deletions(-)

Comments

Junio C Hamano July 31, 2024, 5:29 p.m. UTC | #1
Christian Couder <christian.couder@gmail.com> writes:

> We often have to split strings at some specified terminator character.
> The strbuf_split*() functions, that we can use for this purpose,
> return substrings that include the terminator character, so we often
> need to remove that character.
>
> When it is a whitespace, newline or directory separator, the
> terminator character can easily be removed using an existing triming
> function like strbuf_rtrim(), strbuf_trim_trailing_newline() or
> strbuf_trim_trailing_dir_sep(). There is no function to remove that
> character when it's not one of those characters though.

OK.

> Let's introduce a new strbuf_trim_trailing_ch() function that can be
> used to remove any trailing character, and let's refactor existing code
> that manually removed trailing characters using this new function.

It is disappointing that this new one is not adequate to rewrite any
of the existing strbuf_trim* functions in terms of it, but that's
probably OK.  At least this one we have two existing callers, but
makes me wonder if these callers are doing sensible things in the
first place.  After trimming trailing commas, there may be trailing
newlines to be trimmed, and then again whitespaces around the whole
thing may need to be trimmed---what kind of input is that?  The
value has to be " junk \n\n,,,", but " junk, \n\n, " will only
become "junk, \n\n," without further cleaned up, and it is very
dubious how that is useful.

But that is not an issue this patch introduces ;-)

> diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c
> index d96d908bb9..356fcd38f4 100644
> --- a/trace2/tr2_cfg.c
> +++ b/trace2/tr2_cfg.c
> @@ -33,10 +33,7 @@ static int tr2_cfg_load_patterns(void)
>  
>  	tr2_cfg_patterns = strbuf_split_buf(envvar, strlen(envvar), ',', -1);
>  	for (s = tr2_cfg_patterns; *s; s++) {
> -		struct strbuf *buf = *s;
> -
> -		if (buf->len && buf->buf[buf->len - 1] == ',')
> -			strbuf_setlen(buf, buf->len - 1);
> +		strbuf_trim_trailing_ch(*s, ',');
>  		strbuf_trim_trailing_newline(*s);
>  		strbuf_trim(*s);
>  	}
> @@ -72,10 +69,7 @@ static int tr2_load_env_vars(void)
>  
>  	tr2_cfg_env_vars = strbuf_split_buf(varlist, strlen(varlist), ',', -1);
>  	for (s = tr2_cfg_env_vars; *s; s++) {
> -		struct strbuf *buf = *s;
> -
> -		if (buf->len && buf->buf[buf->len - 1] == ',')
> -			strbuf_setlen(buf, buf->len - 1);
> +		strbuf_trim_trailing_ch(*s, ',');
>  		strbuf_trim_trailing_newline(*s);
>  		strbuf_trim(*s);
>  	}
Taylor Blau July 31, 2024, 9:49 p.m. UTC | #2
On Wed, Jul 31, 2024 at 10:29:00AM -0700, Junio C Hamano wrote:
> > Let's introduce a new strbuf_trim_trailing_ch() function that can be
> > used to remove any trailing character, and let's refactor existing code
> > that manually removed trailing characters using this new function.
>
> It is disappointing that this new one is not adequate to rewrite any
> of the existing strbuf_trim* functions in terms of it, but that's
> probably OK.

I don't think it's possible without some awkwardness. strbuf_[lr]trim()
both trim characters for which isspace(c) is true, and this new function
only trims a single character (also from the right-hand side of the
string, so strbuf_ltrim() would not be a candidate[^1]).

Likewise for strbuf_trim_trailing_dir_sep(), which uses the
platform-dependent is_dir_sep(). strbuf_trim_trailing_newline() is also
complicated because it only removes '\n' or '\r\n' from the end of a
buffer, but not a lone '\r' character.

Thanks,
Taylor

[^1]: Unless you had a function to swap the order of the underlying
  buffer, then call the trim function on the right-hand side, before
  swapping it back. But that's obviously disgusting and clearly a bad
  idea.
Christian Couder Aug. 20, 2024, 11:29 a.m. UTC | #3
On Wed, Jul 31, 2024 at 7:29 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > We often have to split strings at some specified terminator character.
> > The strbuf_split*() functions, that we can use for this purpose,
> > return substrings that include the terminator character, so we often
> > need to remove that character.
> >
> > When it is a whitespace, newline or directory separator, the
> > terminator character can easily be removed using an existing triming
> > function like strbuf_rtrim(), strbuf_trim_trailing_newline() or
> > strbuf_trim_trailing_dir_sep(). There is no function to remove that
> > character when it's not one of those characters though.
>
> OK.
>
> > Let's introduce a new strbuf_trim_trailing_ch() function that can be
> > used to remove any trailing character, and let's refactor existing code
> > that manually removed trailing characters using this new function.
>
> It is disappointing that this new one is not adequate to rewrite any
> of the existing strbuf_trim* functions in terms of it, but that's
> probably OK.

Yeah, I took a look at that but thought it wasn't worth trying to
unify the trim functions as they each have quite specific code and
requirements.


> At least this one we have two existing callers, but
> makes me wonder if these callers are doing sensible things in the
> first place.  After trimming trailing commas, there may be trailing
> newlines to be trimmed, and then again whitespaces around the whole
> thing may need to be trimmed---what kind of input is that?  The
> value has to be " junk \n\n,,,", but " junk, \n\n, " will only
> become "junk, \n\n," without further cleaned up, and it is very
> dubious how that is useful.
>
> But that is not an issue this patch introduces ;-)
Christian Couder Aug. 20, 2024, 11:29 a.m. UTC | #4
On Wed, Jul 31, 2024 at 11:49 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Wed, Jul 31, 2024 at 10:29:00AM -0700, Junio C Hamano wrote:
> > > Let's introduce a new strbuf_trim_trailing_ch() function that can be
> > > used to remove any trailing character, and let's refactor existing code
> > > that manually removed trailing characters using this new function.
> >
> > It is disappointing that this new one is not adequate to rewrite any
> > of the existing strbuf_trim* functions in terms of it, but that's
> > probably OK.
>
> I don't think it's possible without some awkwardness. strbuf_[lr]trim()
> both trim characters for which isspace(c) is true, and this new function
> only trims a single character (also from the right-hand side of the
> string, so strbuf_ltrim() would not be a candidate[^1]).
>
> Likewise for strbuf_trim_trailing_dir_sep(), which uses the
> platform-dependent is_dir_sep(). strbuf_trim_trailing_newline() is also
> complicated because it only removes '\n' or '\r\n' from the end of a
> buffer, but not a lone '\r' character.

Yeah, I agree with that analysis.
diff mbox series

Patch

diff --git a/strbuf.c b/strbuf.c
index cccfdec0e3..c986ec28f4 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -134,6 +134,13 @@  void strbuf_trim_trailing_dir_sep(struct strbuf *sb)
 	sb->buf[sb->len] = '\0';
 }
 
+void strbuf_trim_trailing_ch(struct strbuf *sb, int c)
+{
+	while (sb->len > 0 && sb->buf[sb->len - 1] == c)
+		sb->len--;
+	sb->buf[sb->len] = '\0';
+}
+
 void strbuf_trim_trailing_newline(struct strbuf *sb)
 {
 	if (sb->len > 0 && sb->buf[sb->len - 1] == '\n') {
diff --git a/strbuf.h b/strbuf.h
index 884157873e..5e389ab065 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -197,6 +197,9 @@  void strbuf_trim_trailing_dir_sep(struct strbuf *sb);
 /* Strip trailing LF or CR/LF */
 void strbuf_trim_trailing_newline(struct strbuf *sb);
 
+/* Strip trailing character c */
+void strbuf_trim_trailing_ch(struct strbuf *sb, int c);
+
 /**
  * Replace the contents of the strbuf with a reencoded form.  Returns -1
  * on error, 0 on success.
diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c
index d96d908bb9..356fcd38f4 100644
--- a/trace2/tr2_cfg.c
+++ b/trace2/tr2_cfg.c
@@ -33,10 +33,7 @@  static int tr2_cfg_load_patterns(void)
 
 	tr2_cfg_patterns = strbuf_split_buf(envvar, strlen(envvar), ',', -1);
 	for (s = tr2_cfg_patterns; *s; s++) {
-		struct strbuf *buf = *s;
-
-		if (buf->len && buf->buf[buf->len - 1] == ',')
-			strbuf_setlen(buf, buf->len - 1);
+		strbuf_trim_trailing_ch(*s, ',');
 		strbuf_trim_trailing_newline(*s);
 		strbuf_trim(*s);
 	}
@@ -72,10 +69,7 @@  static int tr2_load_env_vars(void)
 
 	tr2_cfg_env_vars = strbuf_split_buf(varlist, strlen(varlist), ',', -1);
 	for (s = tr2_cfg_env_vars; *s; s++) {
-		struct strbuf *buf = *s;
-
-		if (buf->len && buf->buf[buf->len - 1] == ',')
-			strbuf_setlen(buf, buf->len - 1);
+		strbuf_trim_trailing_ch(*s, ',');
 		strbuf_trim_trailing_newline(*s);
 		strbuf_trim(*s);
 	}