Message ID | dda218c8c1fdc0ca2e4352b820f3432565a74a23.1681428696.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | banned: mark `strok()`, `strtok_r()` as banned | expand |
On Thu, Apr 13, 2023 at 07:31:43PM -0400, Taylor Blau wrote: > Instead of using `strchr(2)` to locate the first occurrence of the given > delimiter character, `string_list_split_in_place_multi()` uses > `strpbrk(2)` to find the first occurrence of *any* character in the given > delimiter string. > > Since the `_multi` variant is a generalization of the original > implementation, reimplement `string_list_split_in_place()` in terms of > the more general function by providing a single-character string for the > list of accepted delimiters. I'd imagine that strpbrk() is potentially a lot slower than strchr(). But I kind of doubt that splitting is such a hot path that it will matter. If we want to care, I think we could do something like: end = delim[1] ? strpbrk(p, delim) : strchr(p, delim[0]); It's entirely possible that a half-decent strpbrk() implementation does this already. So I mention it mostly in case we need to revisit this later. I think it's OK to ignore for now. -Peff
On Tue, Apr 18, 2023 at 06:10:58AM -0400, Jeff King wrote: > On Thu, Apr 13, 2023 at 07:31:43PM -0400, Taylor Blau wrote: > > > Instead of using `strchr(2)` to locate the first occurrence of the given > > delimiter character, `string_list_split_in_place_multi()` uses > > `strpbrk(2)` to find the first occurrence of *any* character in the given > > delimiter string. > > > > Since the `_multi` variant is a generalization of the original > > implementation, reimplement `string_list_split_in_place()` in terms of > > the more general function by providing a single-character string for the > > list of accepted delimiters. > > I'd imagine that strpbrk() is potentially a lot slower than strchr(). > But I kind of doubt that splitting is such a hot path that it will > matter. If we want to care, I think we could do something like: > > end = delim[1] ? strpbrk(p, delim) : strchr(p, delim[0]); > > It's entirely possible that a half-decent strpbrk() implementation does > this already. I did a little research here, and it looks like both glibc[1] and musl[2] already do this. Those links are for their implementations of strcspn(), but which both implement[^1] strpbrk() as "p += strcspn(p, d)". > So I mention it mostly in case we need to revisit this later. I think > it's OK to ignore for now. In addition to being OK from a performance standpoint I think solving the semantics concern you noted lower down in the thread[3] is fairly straightforward. When in "_multi" mode, I think this boils down to moving past any number of characters in `delim` before each iteration of the loop. And that's doable with "p += strspn(p, delim)". That avoids any behavior change, and more closely matches what strtok() does, so I think it's a better path. Thanks, Taylor [1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35 [2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11 [3]: https://lore.kernel.org/git/20230418102320.GB508219@coredump.intra.peff.net/ [^1]: Not entirely. If strcspn() takes you to the end of the string, strpbrk() will return NULL, instead of a pointer to the end of the string.
diff --git a/string-list.c b/string-list.c index db473f273e1..67f9ff18904 100644 --- a/string-list.c +++ b/string-list.c @@ -300,8 +300,8 @@ int string_list_split(struct string_list *list, const char *string, } } -int string_list_split_in_place(struct string_list *list, char *string, - int delim, int maxsplit) +int string_list_split_in_place_multi(struct string_list *list, char *string, + const char *delim, int maxsplit) { int count = 0; char *p = string, *end; @@ -315,7 +315,7 @@ int string_list_split_in_place(struct string_list *list, char *string, string_list_append(list, p); return count; } - end = strchr(p, delim); + end = strpbrk(p, delim); if (end) { *end = '\0'; string_list_append(list, p); @@ -326,3 +326,12 @@ int string_list_split_in_place(struct string_list *list, char *string, } } } + +int string_list_split_in_place(struct string_list *list, char *string, + int delim, int maxsplit) +{ + char delim_s[2] = { delim, 0 }; + + return string_list_split_in_place_multi(list, string, delim_s, + maxsplit); +} diff --git a/string-list.h b/string-list.h index c7b0d5d0008..670d4fc8fb7 100644 --- a/string-list.h +++ b/string-list.h @@ -268,7 +268,13 @@ int string_list_split(struct string_list *list, const char *string, * new string_list_items point into string (which therefore must not * be modified or freed while the string_list is in use). * list->strdup_strings must *not* be set. + * + * The "_multi" variant splits the given string on any character + * appearing in "delim", and the non-"_multi" variant splits only on the + * given character. */ +int string_list_split_in_place_multi(struct string_list *list, char *string, + const char *delim, int maxsplit); int string_list_split_in_place(struct string_list *list, char *string, int delim, int maxsplit); #endif /* STRING_LIST_H */
Introduce a variant of the `string_list_split_in_place()` function that takes a string of accepted delimiters. By contrast to its cousin `string_list_split_in_place()` which splits the given string at every instance of the single character `delim`, the `_multi` variant splits the given string any any character appearing in the string `delim`. Instead of using `strchr(2)` to locate the first occurrence of the given delimiter character, `string_list_split_in_place_multi()` uses `strpbrk(2)` to find the first occurrence of *any* character in the given delimiter string. Since the `_multi` variant is a generalization of the original implementation, reimplement `string_list_split_in_place()` in terms of the more general function by providing a single-character string for the list of accepted delimiters. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- string-list.c | 15 ++++++++++++--- string-list.h | 6 ++++++ 2 files changed, 18 insertions(+), 3 deletions(-)