diff mbox series

[1/2] lib/string: Improve strstarts() performance

Message ID 20250407-imp_str_perf-v1-1-ed95d52964a4@quicinc.com (mailing list archive)
State Rejected
Headers show
Series lib/string: Improve performance for both strstarts() and str_has_prefix() | expand

Commit Message

Zijun Hu April 7, 2025, 1:15 p.m. UTC
From: Zijun Hu <quic_zijuhu@quicinc.com>

strstarts() is frequently invoked to test if a string has another string
as prefix, but its performance is degraded by the strlen() loop contained.

Improve its performance by eliminating the strlen() loop.

Link: https://lore.kernel.org/all/20250113234643.GA3631169-robh@kernel.org
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Cc: Rob Herring (Arm) <robh@kernel.org>
---
 include/linux/string.h | 10 +---------
 lib/string.c           | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+), 9 deletions(-)

Comments

Andy Shevchenko April 7, 2025, 1:51 p.m. UTC | #1
On Mon, Apr 07, 2025 at 09:15:04PM +0800, Zijun Hu wrote:
> From: Zijun Hu <quic_zijuhu@quicinc.com>
> 
> strstarts() is frequently invoked to test if a string has another string
> as prefix, but its performance is degraded by the strlen() loop contained.
> 
> Improve its performance by eliminating the strlen() loop.

NAK.

First of all, this function is supposed to be run against constant string literals.
Second, this commit message has zero proofs to tell if there is actual performance
downgrage even in the case when prefix is not a constant string literal.
Zijun Hu April 7, 2025, 2:33 p.m. UTC | #2
On 2025/4/7 21:51, Andy Shevchenko wrote:
> First of all, this function is supposed to be run against constant string literals.

for strstarts(s, "prefix"), strlen("prefix") should *NOT* be compile
time constant. it is a loop and unavoidable to have strlen("prefix")
iterations.

> Second, this commit message has zero proofs to tell if there is actual performance
> downgrage even in the case when prefix is not a constant string literal.

for either constant string or non constant string. this patch
eliminating a loop which have strlen()iterations. i feel it is obvious
it can improve performance.
Andy Shevchenko April 7, 2025, 2:39 p.m. UTC | #3
On Mon, Apr 07, 2025 at 10:33:34PM +0800, Zijun Hu wrote:
> On 2025/4/7 21:51, Andy Shevchenko wrote:
> > First of all, this function is supposed to be run against constant string literals.
> 
> for strstarts(s, "prefix"), strlen("prefix") should *NOT* be compile
> time constant. it is a loop and unavoidable to have strlen("prefix")
> iterations.

What do you mean by that? Compiler uses __builtin_strlen() and it *IS*
a compile-time constant. Just check it.

> > Second, this commit message has zero proofs to tell if there is actual performance
> > downgrage even in the case when prefix is not a constant string literal.
> 
> for either constant string or non constant string. this patch
> eliminating a loop which have strlen()iterations. i feel it is obvious
> it can improve performance.

No, it's not.
Zijun Hu April 7, 2025, 2:53 p.m. UTC | #4
On 2025/4/7 22:39, Andy Shevchenko wrote:
>> for strstarts(s, "prefix"), strlen("prefix") should *NOT* be compile
>> time constant. it is a loop and unavoidable to have strlen("prefix")
>> iterations.
> What do you mean by that? Compiler uses __builtin_strlen() and it *IS*
> a compile-time constant. Just check it.

thank you Andy for this hints. let me do more investigation about
strlen() within strstarts(s, "prefix").
Kees Cook April 7, 2025, 4:17 p.m. UTC | #5
On Mon, Apr 07, 2025 at 09:15:04PM +0800, Zijun Hu wrote:
> From: Zijun Hu <quic_zijuhu@quicinc.com>
> 
> strstarts() is frequently invoked to test if a string has another string
> as prefix, but its performance is degraded by the strlen() loop contained.
> 
> Improve its performance by eliminating the strlen() loop.

So, as Andy already said: no, this is very unlikely to be a performance
improvement, and if it is, you'll need to show the numbers (and likely
the reason _why_, in the form of assembly output, etc).

The reason this isn't going to be an improvement is because
strlen($string_constant) is optimized by the compiler into a integral
constant value. So you'd be replacing a potentially inline constant with
an explicit function call. That will be much more expensive.

With almost 300 users:
$ git grep 'strstarts' | wc -l
198

Only 38 are _not_ using a string constant:
$ git grep 'strstarts' | grep -v '"' | wc -l
38

Additionally, there is no "loop". strlen() of a runtime string would be
evaluated once.

-Kees
diff mbox series

Patch

diff --git a/include/linux/string.h b/include/linux/string.h
index 01621ad0f598ccdfae819b1a5058607e79d8a751..e5f7defa277572719e1dbfdd264f3de6ef8544f1 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -345,15 +345,7 @@  extern ssize_t memory_read_from_buffer(void *to, size_t count, loff_t *ppos,
 
 int ptr_to_hashval(const void *ptr, unsigned long *hashval_out);
 
-/**
- * strstarts - does @str start with @prefix?
- * @str: string to examine
- * @prefix: prefix to look for.
- */
-static inline bool strstarts(const char *str, const char *prefix)
-{
-	return strncmp(str, prefix, strlen(prefix)) == 0;
-}
+bool strstarts(const char *str, const char *prefix);
 
 size_t memweight(const void *ptr, size_t bytes);
 
diff --git a/lib/string.c b/lib/string.c
index eb4486ed40d259e43dfb63d76c2bd57ef74e2fd1..ea52c8509328358e436766b1186a82419d45089d 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -285,6 +285,28 @@  int strcmp(const char *cs, const char *ct)
 EXPORT_SYMBOL(strcmp);
 #endif
 
+/**
+ * strstarts - does @str start with @prefix?
+ * @str: string to examine
+ * @prefix: prefix to look for.
+ */
+bool strstarts(const char *str, const char *prefix)
+{
+	unsigned char c1, c2;
+
+	do {
+		c1 = *str++;
+		c2 = *prefix++;
+
+		if (c1 != c2)
+			return c2 == '\0';
+
+	} while (c2 != '\0');
+
+	return true;
+}
+EXPORT_SYMBOL(strstarts);
+
 #ifndef __HAVE_ARCH_STRNCMP
 /**
  * strncmp - Compare two length-limited strings