Message ID | 20240731134014.2299361-3-christian.couder@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduce a "promisor-remote" capability | expand |
Christian Couder <christian.couder@gmail.com> writes: > We often have to split strings at some specified terminator character. > The strbuf_split*() functions, that we can use for this purpose, > return substrings that include the terminator character, so we often > need to remove that character. > > When it is a whitespace, newline or directory separator, the > terminator character can easily be removed using an existing triming > function like strbuf_rtrim(), strbuf_trim_trailing_newline() or > strbuf_trim_trailing_dir_sep(). There is no function to remove that > character when it's not one of those characters though. OK. > Let's introduce a new strbuf_trim_trailing_ch() function that can be > used to remove any trailing character, and let's refactor existing code > that manually removed trailing characters using this new function. It is disappointing that this new one is not adequate to rewrite any of the existing strbuf_trim* functions in terms of it, but that's probably OK. At least this one we have two existing callers, but makes me wonder if these callers are doing sensible things in the first place. After trimming trailing commas, there may be trailing newlines to be trimmed, and then again whitespaces around the whole thing may need to be trimmed---what kind of input is that? The value has to be " junk \n\n,,,", but " junk, \n\n, " will only become "junk, \n\n," without further cleaned up, and it is very dubious how that is useful. But that is not an issue this patch introduces ;-) > diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c > index d96d908bb9..356fcd38f4 100644 > --- a/trace2/tr2_cfg.c > +++ b/trace2/tr2_cfg.c > @@ -33,10 +33,7 @@ static int tr2_cfg_load_patterns(void) > > tr2_cfg_patterns = strbuf_split_buf(envvar, strlen(envvar), ',', -1); > for (s = tr2_cfg_patterns; *s; s++) { > - struct strbuf *buf = *s; > - > - if (buf->len && buf->buf[buf->len - 1] == ',') > - strbuf_setlen(buf, buf->len - 1); > + strbuf_trim_trailing_ch(*s, ','); > strbuf_trim_trailing_newline(*s); > strbuf_trim(*s); > } > @@ -72,10 +69,7 @@ static int tr2_load_env_vars(void) > > tr2_cfg_env_vars = strbuf_split_buf(varlist, strlen(varlist), ',', -1); > for (s = tr2_cfg_env_vars; *s; s++) { > - struct strbuf *buf = *s; > - > - if (buf->len && buf->buf[buf->len - 1] == ',') > - strbuf_setlen(buf, buf->len - 1); > + strbuf_trim_trailing_ch(*s, ','); > strbuf_trim_trailing_newline(*s); > strbuf_trim(*s); > }
On Wed, Jul 31, 2024 at 10:29:00AM -0700, Junio C Hamano wrote: > > Let's introduce a new strbuf_trim_trailing_ch() function that can be > > used to remove any trailing character, and let's refactor existing code > > that manually removed trailing characters using this new function. > > It is disappointing that this new one is not adequate to rewrite any > of the existing strbuf_trim* functions in terms of it, but that's > probably OK. I don't think it's possible without some awkwardness. strbuf_[lr]trim() both trim characters for which isspace(c) is true, and this new function only trims a single character (also from the right-hand side of the string, so strbuf_ltrim() would not be a candidate[^1]). Likewise for strbuf_trim_trailing_dir_sep(), which uses the platform-dependent is_dir_sep(). strbuf_trim_trailing_newline() is also complicated because it only removes '\n' or '\r\n' from the end of a buffer, but not a lone '\r' character. Thanks, Taylor [^1]: Unless you had a function to swap the order of the underlying buffer, then call the trim function on the right-hand side, before swapping it back. But that's obviously disgusting and clearly a bad idea.
On Wed, Jul 31, 2024 at 7:29 PM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > We often have to split strings at some specified terminator character. > > The strbuf_split*() functions, that we can use for this purpose, > > return substrings that include the terminator character, so we often > > need to remove that character. > > > > When it is a whitespace, newline or directory separator, the > > terminator character can easily be removed using an existing triming > > function like strbuf_rtrim(), strbuf_trim_trailing_newline() or > > strbuf_trim_trailing_dir_sep(). There is no function to remove that > > character when it's not one of those characters though. > > OK. > > > Let's introduce a new strbuf_trim_trailing_ch() function that can be > > used to remove any trailing character, and let's refactor existing code > > that manually removed trailing characters using this new function. > > It is disappointing that this new one is not adequate to rewrite any > of the existing strbuf_trim* functions in terms of it, but that's > probably OK. Yeah, I took a look at that but thought it wasn't worth trying to unify the trim functions as they each have quite specific code and requirements. > At least this one we have two existing callers, but > makes me wonder if these callers are doing sensible things in the > first place. After trimming trailing commas, there may be trailing > newlines to be trimmed, and then again whitespaces around the whole > thing may need to be trimmed---what kind of input is that? The > value has to be " junk \n\n,,,", but " junk, \n\n, " will only > become "junk, \n\n," without further cleaned up, and it is very > dubious how that is useful. > > But that is not an issue this patch introduces ;-)
On Wed, Jul 31, 2024 at 11:49 PM Taylor Blau <me@ttaylorr.com> wrote: > > On Wed, Jul 31, 2024 at 10:29:00AM -0700, Junio C Hamano wrote: > > > Let's introduce a new strbuf_trim_trailing_ch() function that can be > > > used to remove any trailing character, and let's refactor existing code > > > that manually removed trailing characters using this new function. > > > > It is disappointing that this new one is not adequate to rewrite any > > of the existing strbuf_trim* functions in terms of it, but that's > > probably OK. > > I don't think it's possible without some awkwardness. strbuf_[lr]trim() > both trim characters for which isspace(c) is true, and this new function > only trims a single character (also from the right-hand side of the > string, so strbuf_ltrim() would not be a candidate[^1]). > > Likewise for strbuf_trim_trailing_dir_sep(), which uses the > platform-dependent is_dir_sep(). strbuf_trim_trailing_newline() is also > complicated because it only removes '\n' or '\r\n' from the end of a > buffer, but not a lone '\r' character. Yeah, I agree with that analysis.
diff --git a/strbuf.c b/strbuf.c index cccfdec0e3..c986ec28f4 100644 --- a/strbuf.c +++ b/strbuf.c @@ -134,6 +134,13 @@ void strbuf_trim_trailing_dir_sep(struct strbuf *sb) sb->buf[sb->len] = '\0'; } +void strbuf_trim_trailing_ch(struct strbuf *sb, int c) +{ + while (sb->len > 0 && sb->buf[sb->len - 1] == c) + sb->len--; + sb->buf[sb->len] = '\0'; +} + void strbuf_trim_trailing_newline(struct strbuf *sb) { if (sb->len > 0 && sb->buf[sb->len - 1] == '\n') { diff --git a/strbuf.h b/strbuf.h index 884157873e..5e389ab065 100644 --- a/strbuf.h +++ b/strbuf.h @@ -197,6 +197,9 @@ void strbuf_trim_trailing_dir_sep(struct strbuf *sb); /* Strip trailing LF or CR/LF */ void strbuf_trim_trailing_newline(struct strbuf *sb); +/* Strip trailing character c */ +void strbuf_trim_trailing_ch(struct strbuf *sb, int c); + /** * Replace the contents of the strbuf with a reencoded form. Returns -1 * on error, 0 on success. diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c index d96d908bb9..356fcd38f4 100644 --- a/trace2/tr2_cfg.c +++ b/trace2/tr2_cfg.c @@ -33,10 +33,7 @@ static int tr2_cfg_load_patterns(void) tr2_cfg_patterns = strbuf_split_buf(envvar, strlen(envvar), ',', -1); for (s = tr2_cfg_patterns; *s; s++) { - struct strbuf *buf = *s; - - if (buf->len && buf->buf[buf->len - 1] == ',') - strbuf_setlen(buf, buf->len - 1); + strbuf_trim_trailing_ch(*s, ','); strbuf_trim_trailing_newline(*s); strbuf_trim(*s); } @@ -72,10 +69,7 @@ static int tr2_load_env_vars(void) tr2_cfg_env_vars = strbuf_split_buf(varlist, strlen(varlist), ',', -1); for (s = tr2_cfg_env_vars; *s; s++) { - struct strbuf *buf = *s; - - if (buf->len && buf->buf[buf->len - 1] == ',') - strbuf_setlen(buf, buf->len - 1); + strbuf_trim_trailing_ch(*s, ','); strbuf_trim_trailing_newline(*s); strbuf_trim(*s); }
We often have to split strings at some specified terminator character. The strbuf_split*() functions, that we can use for this purpose, return substrings that include the terminator character, so we often need to remove that character. When it is a whitespace, newline or directory separator, the terminator character can easily be removed using an existing triming function like strbuf_rtrim(), strbuf_trim_trailing_newline() or strbuf_trim_trailing_dir_sep(). There is no function to remove that character when it's not one of those characters though. Let's introduce a new strbuf_trim_trailing_ch() function that can be used to remove any trailing character, and let's refactor existing code that manually removed trailing characters using this new function. We are also going to use this new function in a following commit. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> --- strbuf.c | 7 +++++++ strbuf.h | 3 +++ trace2/tr2_cfg.c | 10 ++-------- 3 files changed, 12 insertions(+), 8 deletions(-)