diff mbox series

[v4,4/8] kallsyms: Improve the performance of kallsyms_lookup_name()

Message ID 20220920071317.1787-5-thunder.leizhen@huawei.com (mailing list archive)
State New, archived
Headers show
Series kallsyms: Optimizes the performance of lookup symbols | expand

Commit Message

Zhen Lei Sept. 20, 2022, 7:13 a.m. UTC
Currently, to search for a symbol, we need to expand the symbols in
'kallsyms_names' one by one, and then use the expanded string for
comparison. This process can be optimized.

And now scripts/kallsyms no longer compresses the symbol types, each
symbol type always occupies one byte. So we can first compress the
searched symbol and then make a quick comparison based on the compressed
length and content. In this way, for entries with mismatched lengths,
there is no need to expand and compare strings. And for those matching
lengths, there's no need to expand the symbol. This saves a lot of time.
According to my test results, the average performance of
kallsyms_lookup_name() can be improved by 20 to 30 times.

The pseudo code of the test case is as follows:
static int stat_find_name(...)
{
	start = sched_clock();
	(void)kallsyms_lookup_name(name);
	end = sched_clock();
	//Update min, max, cnt, sum
}

/*
 * Traverse all symbols in sequence and collect statistics on the time
 * taken by kallsyms_lookup_name() to lookup each symbol.
 */
kallsyms_on_each_symbol(stat_find_name, NULL);

The test results are as follows (twice):
After : min=5250, max=  726560, avg= 302132
After : min=5320, max=  726850, avg= 301978
Before: min=170,  max=15949190, avg=7553906
Before: min=160,  max=15877280, avg=7517784

The average time consumed is only 4.01% and the maximum time consumed is
only 4.57% of the time consumed before optimization.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 76 insertions(+), 3 deletions(-)

Comments

Petr Mladek Sept. 21, 2022, 3:25 p.m. UTC | #1
On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
> Currently, to search for a symbol, we need to expand the symbols in
> 'kallsyms_names' one by one, and then use the expanded string for
> comparison. This process can be optimized.
> 
> And now scripts/kallsyms no longer compresses the symbol types, each
> symbol type always occupies one byte. So we can first compress the
> searched symbol and then make a quick comparison based on the compressed
> length and content. In this way, for entries with mismatched lengths,
> there is no need to expand and compare strings. And for those matching
> lengths, there's no need to expand the symbol. This saves a lot of time.
> According to my test results, the average performance of
> kallsyms_lookup_name() can be improved by 20 to 30 times.
> 
> The pseudo code of the test case is as follows:
> static int stat_find_name(...)
> {
> 	start = sched_clock();
> 	(void)kallsyms_lookup_name(name);
> 	end = sched_clock();
> 	//Update min, max, cnt, sum
> }
> 
> /*
>  * Traverse all symbols in sequence and collect statistics on the time
>  * taken by kallsyms_lookup_name() to lookup each symbol.
>  */
> kallsyms_on_each_symbol(stat_find_name, NULL);
> 
> The test results are as follows (twice):
> After : min=5250, max=  726560, avg= 302132
> After : min=5320, max=  726850, avg= 301978
> Before: min=170,  max=15949190, avg=7553906
> Before: min=160,  max=15877280, avg=7517784
> 
> The average time consumed is only 4.01% and the maximum time consumed is
> only 4.57% of the time consumed before optimization.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 76 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>  	return off;
>  }
>  
> +static int kallsyms_name_to_tokens(const char *name, char *buf)

This is not safe API. It is always needed to pass the size of the
buffer.

Also it should be called "compress". "token" is just an implementation
detail.

I would do:

static int kallsyms_compress_symbol_name(const char *name,
					 char *buf, size_t size)


> +{
> +	int i, j, k, n;
> +	int len, token_len;
> +	const char *token;
> +	unsigned char token_idx[KSYM_NAME_LEN];
> +	unsigned char token_bak[KSYM_NAME_LEN];

Why do we need two buffers? It should be possible to compress the name
in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.

> +
> +	/*
> +	 * n, number of tokens in the string name.
> +	 * token_idx[i], the start index of the ith token.
> +	 * token_idx[n] is used to calculate the length of the last token.
> +	 */
> +	n = strlen(name);
> +	if (n >= KSYM_NAME_LEN) {
> +		buf[0] = 0;
> +		return 0;
> +	}
> +	for (i = 0; i <= n; i++)
> +		token_idx[i] = (unsigned char)i;
> +
> +	/*
> +	 * For tokens whose token_len >= 2, a larger index value indicates
> +	 * a higher occurrence frequency. See scripts/kallsyms.c
> +	 */
> +	for (i = 255; i >= 0; i--) {
> +		token = &kallsyms_token_table[kallsyms_token_index[i]];
> +		token_len = strlen(token);
> +		if (token_len <= 1)
> +			continue;
> +
> +		/*
> +		 * Find and merge two tokens into one.
> +		 *
> +		 *                |<-- new_token -->|
> +		 *                | token1 | token2 |
> +		 * token_idx[]:   j       j+1      j+2
> +		 */
> +		for (j = 0; j < n - 1; j++) {
> +			len = token_idx[j + 2] - token_idx[j];
> +			if (len == token_len &&
> +			    !strncmp(name + token_idx[j], token, len)) {
> +				token_bak[token_idx[j]] = (unsigned char)i;
> +				for (k = j + 1; k < n; k++)
> +					token_idx[k] = token_idx[k + 1];
> +				n--;
> +			}
> +		}
> +	}
> +
> +	for (j = 0; j < n; j++) {
> +		len = token_idx[j + 1] - token_idx[j];
> +		if (len <= 1) {
> +			buf[j] = name[token_idx[j]];
> +			continue;
> +		}
> +
> +		buf[j] = token_bak[token_idx[j]];

Maybe, I do not understand the compression format correctly but
this code looks too complicated. Honestly, I even did not try to
understand it.

My understanding is the we just need to find all tokens and
replace them with index.

It should be even easier than compress_symbols() in scripts/callsyms.c.
The token_table already exists and we do not need to handle the token_profit...

The following looks more strigtforward (not even compile tested):

	ssize_t len, size;

	len = strscpy(buf, symname, size);
	if (WARN_ON_ONCE(len < 0))
		return -EINVAL;

	/* the tokens with higher index are used first */
	for (idx = 255; idx >= 0; idx--) {
		token =	&kallsyms_token_table[kallsyms_token_index[i]];
		token_len = strlen(token);

		p1 = buf;
		/* length of the remaining symname including the trailing '\0' */
		remaining = len + 1;

		/* find the token in the symbol name */
		p2 = strstr(token, p1);

		while (p2) {
			/* replace token with index */
			*p2 = idx;
			remaining -= ((p2 - p1) + token_len);
			memmove(p2 + 1, p2 + token_len, remaining);
			len -= (token_len - 1);
			p1 = p2;

			/* find the token in the rest of the symbol name */
			p2 = strstr(token, p1);
		}
	}

	return len;

> +	}
> +	buf[n] = 0;
> +
> +	return n;
> +}
> +
>  /*
>   * Get symbol type information. This is encoded as a single char at the
>   * beginning of the symbol name.
> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>  	char namebuf[KSYM_NAME_LEN];
>  	unsigned long i;
>  	unsigned int off;
> +	int len;
>  
>  	/* Skip the search for empty string. */
>  	if (!*name)
>  		return 0;
>  
> +	len = kallsyms_name_to_tokens(name, namebuf);
> +	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
> +		if (kallsyms_names[off] == len + 1 &&
> +		    !memcmp(&kallsyms_names[off + 2], namebuf, len))
> +			return kallsyms_sym_address(i);
> +
> +		off += kallsyms_names[off] + 1;

These complicated checks are hard to review. The following looks much
more readable to me:

	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
		/* the stored symbol name is prefixed by symbol type */
		name_len = kallsyms_names[off] - 1;
		name = &kallsyms_names[off + 2];
		off += name_len + 2;

		if (name_len != len)
			continue;

		if (!memcmp(name, namebuf, len))
			return kallsyms_sym_address(i);
	}


> +	}
> +
>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>  
> -		if (strcmp(namebuf, name) == 0)
> -			return kallsyms_sym_address(i);
> -
>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>  			return kallsyms_sym_address(i);

Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
It might actually slow down the search because we would need to use
both fast and slow search?

Best Regards,
Petr
Zhen Lei Sept. 22, 2022, 2:15 a.m. UTC | #2
On 2022/9/21 23:25, Petr Mladek wrote:
> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
>> Currently, to search for a symbol, we need to expand the symbols in
>> 'kallsyms_names' one by one, and then use the expanded string for
>> comparison. This process can be optimized.
>>
>> And now scripts/kallsyms no longer compresses the symbol types, each
>> symbol type always occupies one byte. So we can first compress the
>> searched symbol and then make a quick comparison based on the compressed
>> length and content. In this way, for entries with mismatched lengths,
>> there is no need to expand and compare strings. And for those matching
>> lengths, there's no need to expand the symbol. This saves a lot of time.
>> According to my test results, the average performance of
>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>
>> The pseudo code of the test case is as follows:
>> static int stat_find_name(...)
>> {
>> 	start = sched_clock();
>> 	(void)kallsyms_lookup_name(name);
>> 	end = sched_clock();
>> 	//Update min, max, cnt, sum
>> }
>>
>> /*
>>  * Traverse all symbols in sequence and collect statistics on the time
>>  * taken by kallsyms_lookup_name() to lookup each symbol.
>>  */
>> kallsyms_on_each_symbol(stat_find_name, NULL);
>>
>> The test results are as follows (twice):
>> After : min=5250, max=  726560, avg= 302132
>> After : min=5320, max=  726850, avg= 301978
>> Before: min=170,  max=15949190, avg=7553906
>> Before: min=160,  max=15877280, avg=7517784
>>
>> The average time consumed is only 4.01% and the maximum time consumed is
>> only 4.57% of the time consumed before optimization.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 76 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
>> --- a/kernel/kallsyms.c
>> +++ b/kernel/kallsyms.c
>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>>  	return off;
>>  }
>>  
>> +static int kallsyms_name_to_tokens(const char *name, char *buf)
> 
> This is not safe API. It is always needed to pass the size of the
> buffer.

OK

> 
> Also it should be called "compress". "token" is just an implementation
> detail.
> 
> I would do:
> 
> static int kallsyms_compress_symbol_name(const char *name,
> 					 char *buf, size_t size)

This's a wonderful name. Thanks.

> 
> 
>> +{
>> +	int i, j, k, n;
>> +	int len, token_len;
>> +	const char *token;
>> +	unsigned char token_idx[KSYM_NAME_LEN];
>> +	unsigned char token_bak[KSYM_NAME_LEN];
> 
> Why do we need two buffers? It should be possible to compress the name
> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.

Because the performance would be a little better. Now this function takes
just over a microsecond. Currently, it takes about 250 microseconds on
average to lookup a symbol, so adding a little more time to this function
doesn't affect the overall picture. I'll modify and test it as you suggest
below.

> 
>> +
>> +	/*
>> +	 * n, number of tokens in the string name.
>> +	 * token_idx[i], the start index of the ith token.
>> +	 * token_idx[n] is used to calculate the length of the last token.
>> +	 */
>> +	n = strlen(name);
>> +	if (n >= KSYM_NAME_LEN) {
>> +		buf[0] = 0;
>> +		return 0;
>> +	}
>> +	for (i = 0; i <= n; i++)
>> +		token_idx[i] = (unsigned char)i;
>> +
>> +	/*
>> +	 * For tokens whose token_len >= 2, a larger index value indicates
>> +	 * a higher occurrence frequency. See scripts/kallsyms.c
>> +	 */
>> +	for (i = 255; i >= 0; i--) {
>> +		token = &kallsyms_token_table[kallsyms_token_index[i]];
>> +		token_len = strlen(token);
>> +		if (token_len <= 1)
>> +			continue;
>> +
>> +		/*
>> +		 * Find and merge two tokens into one.
>> +		 *
>> +		 *                |<-- new_token -->|
>> +		 *                | token1 | token2 |
>> +		 * token_idx[]:   j       j+1      j+2
>> +		 */
>> +		for (j = 0; j < n - 1; j++) {
>> +			len = token_idx[j + 2] - token_idx[j];
>> +			if (len == token_len &&
>> +			    !strncmp(name + token_idx[j], token, len)) {
>> +				token_bak[token_idx[j]] = (unsigned char)i;
>> +				for (k = j + 1; k < n; k++)
>> +					token_idx[k] = token_idx[k + 1];
>> +				n--;
>> +			}
>> +		}
>> +	}
>> +
>> +	for (j = 0; j < n; j++) {
>> +		len = token_idx[j + 1] - token_idx[j];
>> +		if (len <= 1) {
>> +			buf[j] = name[token_idx[j]];
>> +			continue;
>> +		}
>> +
>> +		buf[j] = token_bak[token_idx[j]];
> 
> Maybe, I do not understand the compression format correctly but
> this code looks too complicated. Honestly, I even did not try to
> understand it.
> 
> My understanding is the we just need to find all tokens and
> replace them with index.
> 
> It should be even easier than compress_symbols() in scripts/callsyms.c.
> The token_table already exists and we do not need to handle the token_profit...
> 
> The following looks more strigtforward (not even compile tested):

OK, I will try this one. Or refer to compress_symbols() in scripts/callsyms.c.

> 
> 	ssize_t len, size;
> 
> 	len = strscpy(buf, symname, size);
> 	if (WARN_ON_ONCE(len < 0))
> 		return -EINVAL;
> 
> 	/* the tokens with higher index are used first */
> 	for (idx = 255; idx >= 0; idx--) {
> 		token =	&kallsyms_token_table[kallsyms_token_index[i]];
> 		token_len = strlen(token);
> 
> 		p1 = buf;
> 		/* length of the remaining symname including the trailing '\0' */
> 		remaining = len + 1;
> 
> 		/* find the token in the symbol name */
> 		p2 = strstr(token, p1);
> 
> 		while (p2) {
> 			/* replace token with index */
> 			*p2 = idx;
> 			remaining -= ((p2 - p1) + token_len);
> 			memmove(p2 + 1, p2 + token_len, remaining);
> 			len -= (token_len - 1);
> 			p1 = p2;
> 
> 			/* find the token in the rest of the symbol name */
> 			p2 = strstr(token, p1);
> 		}
> 	}
> 
> 	return len;
> 
>> +	}
>> +	buf[n] = 0;
>> +
>> +	return n;
>> +}
>> +
>>  /*
>>   * Get symbol type information. This is encoded as a single char at the
>>   * beginning of the symbol name.
>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>>  	char namebuf[KSYM_NAME_LEN];
>>  	unsigned long i;
>>  	unsigned int off;
>> +	int len;
>>  
>>  	/* Skip the search for empty string. */
>>  	if (!*name)
>>  		return 0;
>>  
>> +	len = kallsyms_name_to_tokens(name, namebuf);
>> +	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
>> +		if (kallsyms_names[off] == len + 1 &&
>> +		    !memcmp(&kallsyms_names[off + 2], namebuf, len))
>> +			return kallsyms_sym_address(i);
>> +
>> +		off += kallsyms_names[off] + 1;
> 
> These complicated checks are hard to review. The following looks much
> more readable to me:

Yes, it looks well.

> 
> 	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
> 		/* the stored symbol name is prefixed by symbol type */
> 		name_len = kallsyms_names[off] - 1;
> 		name = &kallsyms_names[off + 2];
> 		off += name_len + 2;
> 
> 		if (name_len != len)
> 			continue;
> 
> 		if (!memcmp(name, namebuf, len))
> 			return kallsyms_sym_address(i);
> 	}
> 
> 
>> +	}
>> +
>>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>>  
>> -		if (strcmp(namebuf, name) == 0)
>> -			return kallsyms_sym_address(i);
>> -
>>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>>  			return kallsyms_sym_address(i);
> 
> Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
> It might actually slow down the search because we would need to use
> both fast and slow search?

Theoretically, I don't think so. A string comparison was removed from the
slow search. "if (name_len != len)" is faster than
"if (strcmp(namebuf, name) == 0)". Even if they're equal,
kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
overall picture. The average lookup time before optimization is
millisecond-level. To allay your concerns, I can run a test.

Before: min=170,  max=15949190, avg=7553906

> 
> Best Regards,
> Petr
> .
>
Petr Mladek Sept. 22, 2022, 7:02 a.m. UTC | #3
On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/9/21 23:25, Petr Mladek wrote:
> > On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
> >> Currently, to search for a symbol, we need to expand the symbols in
> >> 'kallsyms_names' one by one, and then use the expanded string for
> >> comparison. This process can be optimized.
> >>
> >> And now scripts/kallsyms no longer compresses the symbol types, each
> >> symbol type always occupies one byte. So we can first compress the
> >> searched symbol and then make a quick comparison based on the compressed
> >> length and content. In this way, for entries with mismatched lengths,
> >> there is no need to expand and compare strings. And for those matching
> >> lengths, there's no need to expand the symbol. This saves a lot of time.
> >> According to my test results, the average performance of
> >> kallsyms_lookup_name() can be improved by 20 to 30 times.
> >>
> >> The pseudo code of the test case is as follows:
> >> static int stat_find_name(...)
> >> {
> >> 	start = sched_clock();
> >> 	(void)kallsyms_lookup_name(name);
> >> 	end = sched_clock();
> >> 	//Update min, max, cnt, sum
> >> }
> >>
> >> /*
> >>  * Traverse all symbols in sequence and collect statistics on the time
> >>  * taken by kallsyms_lookup_name() to lookup each symbol.
> >>  */
> >> kallsyms_on_each_symbol(stat_find_name, NULL);
> >>
> >> The test results are as follows (twice):
> >> After : min=5250, max=  726560, avg= 302132
> >> After : min=5320, max=  726850, avg= 301978
> >> Before: min=170,  max=15949190, avg=7553906
> >> Before: min=160,  max=15877280, avg=7517784
> >>
> >> The average time consumed is only 4.01% and the maximum time consumed is
> >> only 4.57% of the time consumed before optimization.
> >>
> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >> ---
> >>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
> >>  1 file changed, 76 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> >> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
> >> --- a/kernel/kallsyms.c
> >> +++ b/kernel/kallsyms.c
> >> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
> >> +{
> >> +	int i, j, k, n;
> >> +	int len, token_len;
> >> +	const char *token;
> >> +	unsigned char token_idx[KSYM_NAME_LEN];
> >> +	unsigned char token_bak[KSYM_NAME_LEN];
> > 
> > Why do we need two buffers? It should be possible to compress the name
> > in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
> 
> Because the performance would be a little better. Now this function takes
> just over a microsecond. Currently, it takes about 250 microseconds on
> average to lookup a symbol, so adding a little more time to this function
> doesn't affect the overall picture. I'll modify and test it as you suggest
> below.

We need to be careful about a stack overflow. I have seen that
KSYM_NAME_LEN might need to be increased to 512 because of
Rust support, see
https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org

> >> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
> >>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
> >>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
> >>  
> >> -		if (strcmp(namebuf, name) == 0)
> >> -			return kallsyms_sym_address(i);
> >> -
> >>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
> >>  			return kallsyms_sym_address(i);
> > 
> > Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
> > It might actually slow down the search because we would need to use
> > both fast and slow search?
> 
> Theoretically, I don't think so. A string comparison was removed from the
> slow search. "if (name_len != len)" is faster than
> "if (strcmp(namebuf, name) == 0)". Even if they're equal,
> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
> overall picture. The average lookup time before optimization is
> millisecond-level.
>
> Before: min=170,  max=15949190, avg=7553906

Good point! I agree that the potential extra overhead is negligible
when using the old code as a fallback.

Best Regards,
Petr
Zhen Lei Sept. 22, 2022, 7:14 a.m. UTC | #4
On 2022/9/22 10:15, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/9/21 23:25, Petr Mladek wrote:
>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
>>> Currently, to search for a symbol, we need to expand the symbols in
>>> 'kallsyms_names' one by one, and then use the expanded string for
>>> comparison. This process can be optimized.
>>>
>>> And now scripts/kallsyms no longer compresses the symbol types, each
>>> symbol type always occupies one byte. So we can first compress the
>>> searched symbol and then make a quick comparison based on the compressed
>>> length and content. In this way, for entries with mismatched lengths,
>>> there is no need to expand and compare strings. And for those matching
>>> lengths, there's no need to expand the symbol. This saves a lot of time.
>>> According to my test results, the average performance of
>>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>>
>>> The pseudo code of the test case is as follows:
>>> static int stat_find_name(...)
>>> {
>>> 	start = sched_clock();
>>> 	(void)kallsyms_lookup_name(name);
>>> 	end = sched_clock();
>>> 	//Update min, max, cnt, sum
>>> }
>>>
>>> /*
>>>  * Traverse all symbols in sequence and collect statistics on the time
>>>  * taken by kallsyms_lookup_name() to lookup each symbol.
>>>  */
>>> kallsyms_on_each_symbol(stat_find_name, NULL);
>>>
>>> The test results are as follows (twice):
>>> After : min=5250, max=  726560, avg= 302132
>>> After : min=5320, max=  726850, avg= 301978
>>> Before: min=170,  max=15949190, avg=7553906
>>> Before: min=160,  max=15877280, avg=7517784
>>>
>>> The average time consumed is only 4.01% and the maximum time consumed is
>>> only 4.57% of the time consumed before optimization.
>>>
>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>> ---
>>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>>>  1 file changed, 76 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
>>> --- a/kernel/kallsyms.c
>>> +++ b/kernel/kallsyms.c
>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>>>  	return off;
>>>  }
>>>  
>>> +static int kallsyms_name_to_tokens(const char *name, char *buf)
>>
>> This is not safe API. It is always needed to pass the size of the
>> buffer.
> 
> OK
> 
>>
>> Also it should be called "compress". "token" is just an implementation
>> detail.
>>
>> I would do:
>>
>> static int kallsyms_compress_symbol_name(const char *name,
>> 					 char *buf, size_t size)
> 
> This's a wonderful name. Thanks.
> 
>>
>>
>>> +{
>>> +	int i, j, k, n;
>>> +	int len, token_len;
>>> +	const char *token;
>>> +	unsigned char token_idx[KSYM_NAME_LEN];
>>> +	unsigned char token_bak[KSYM_NAME_LEN];
>>
>> Why do we need two buffers? It should be possible to compress the name
>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
> 
> Because the performance would be a little better. Now this function takes
> just over a microsecond. Currently, it takes about 250 microseconds on
> average to lookup a symbol, so adding a little more time to this function
> doesn't affect the overall picture. I'll modify and test it as you suggest
> below.
> 
>>
>>> +
>>> +	/*
>>> +	 * n, number of tokens in the string name.
>>> +	 * token_idx[i], the start index of the ith token.
>>> +	 * token_idx[n] is used to calculate the length of the last token.
>>> +	 */
>>> +	n = strlen(name);
>>> +	if (n >= KSYM_NAME_LEN) {
>>> +		buf[0] = 0;
>>> +		return 0;
>>> +	}
>>> +	for (i = 0; i <= n; i++)
>>> +		token_idx[i] = (unsigned char)i;
>>> +
>>> +	/*
>>> +	 * For tokens whose token_len >= 2, a larger index value indicates
>>> +	 * a higher occurrence frequency. See scripts/kallsyms.c
>>> +	 */
>>> +	for (i = 255; i >= 0; i--) {
>>> +		token = &kallsyms_token_table[kallsyms_token_index[i]];
>>> +		token_len = strlen(token);
>>> +		if (token_len <= 1)
>>> +			continue;
>>> +
>>> +		/*
>>> +		 * Find and merge two tokens into one.
>>> +		 *
>>> +		 *                |<-- new_token -->|
>>> +		 *                | token1 | token2 |
>>> +		 * token_idx[]:   j       j+1      j+2
>>> +		 */
>>> +		for (j = 0; j < n - 1; j++) {
>>> +			len = token_idx[j + 2] - token_idx[j];
>>> +			if (len == token_len &&
>>> +			    !strncmp(name + token_idx[j], token, len)) {
>>> +				token_bak[token_idx[j]] = (unsigned char)i;
>>> +				for (k = j + 1; k < n; k++)
>>> +					token_idx[k] = token_idx[k + 1];
>>> +				n--;
>>> +			}
>>> +		}
>>> +	}
>>> +
>>> +	for (j = 0; j < n; j++) {
>>> +		len = token_idx[j + 1] - token_idx[j];
>>> +		if (len <= 1) {
>>> +			buf[j] = name[token_idx[j]];
>>> +			continue;
>>> +		}
>>> +
>>> +		buf[j] = token_bak[token_idx[j]];
>>
>> Maybe, I do not understand the compression format correctly but
>> this code looks too complicated. Honestly, I even did not try to
>> understand it.
>>
>> My understanding is the we just need to find all tokens and
>> replace them with index.
>>
>> It should be even easier than compress_symbols() in scripts/callsyms.c.
>> The token_table already exists and we do not need to handle the token_profit...
>>
>> The following looks more strigtforward (not even compile tested):
> 
> OK, I will try this one. Or refer to compress_symbols() in scripts/callsyms.c.

This method won't work. Because the tokens in kallsyms_token_table[] have
been expanded. For example: name = "nfs_fs_proc_net_init"
kallsyms_token_table[0xf3] = "s_",  raw token = 73 5f
kallsyms_token_table[0x9f] = "fs_", raw token = 66 f3

After "s_" have been replaced with f3, there is no "fs_" substring in namebuf.
That's why I wrote a new piece of code. Due to I didn't want to add a variable
like kallsyms_token_table[].

Now, I will add kallsyms_best_token_table[], kallsyms_best_token_table_len;
kallsyms_best_token_table[] does not store tokens that contain only one
character. And index=0 is the token with the highest frequency.


> 
>>
>> 	ssize_t len, size;
>>
>> 	len = strscpy(buf, symname, size);
>> 	if (WARN_ON_ONCE(len < 0))
>> 		return -EINVAL;
>>
>> 	/* the tokens with higher index are used first */
>> 	for (idx = 255; idx >= 0; idx--) {
>> 		token =	&kallsyms_token_table[kallsyms_token_index[i]];
>> 		token_len = strlen(token);
>>
>> 		p1 = buf;
>> 		/* length of the remaining symname including the trailing '\0' */
>> 		remaining = len + 1;
>>
>> 		/* find the token in the symbol name */
>> 		p2 = strstr(token, p1);
>>
>> 		while (p2) {
>> 			/* replace token with index */
>> 			*p2 = idx;
>> 			remaining -= ((p2 - p1) + token_len);
>> 			memmove(p2 + 1, p2 + token_len, remaining);
>> 			len -= (token_len - 1);
>> 			p1 = p2;
>>
>> 			/* find the token in the rest of the symbol name */
>> 			p2 = strstr(token, p1);
>> 		}
>> 	}
>>
>> 	return len;
>>
>>> +	}
>>> +	buf[n] = 0;
>>> +
>>> +	return n;
>>> +}
>>> +
>>>  /*
>>>   * Get symbol type information. This is encoded as a single char at the
>>>   * beginning of the symbol name.
>>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>>>  	char namebuf[KSYM_NAME_LEN];
>>>  	unsigned long i;
>>>  	unsigned int off;
>>> +	int len;
>>>  
>>>  	/* Skip the search for empty string. */
>>>  	if (!*name)
>>>  		return 0;
>>>  
>>> +	len = kallsyms_name_to_tokens(name, namebuf);
>>> +	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
>>> +		if (kallsyms_names[off] == len + 1 &&
>>> +		    !memcmp(&kallsyms_names[off + 2], namebuf, len))
>>> +			return kallsyms_sym_address(i);
>>> +
>>> +		off += kallsyms_names[off] + 1;
>>
>> These complicated checks are hard to review. The following looks much
>> more readable to me:
> 
> Yes, it looks well.
> 
>>
>> 	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
>> 		/* the stored symbol name is prefixed by symbol type */
>> 		name_len = kallsyms_names[off] - 1;
>> 		name = &kallsyms_names[off + 2];
>> 		off += name_len + 2;
>>
>> 		if (name_len != len)
>> 			continue;
>>
>> 		if (!memcmp(name, namebuf, len))
>> 			return kallsyms_sym_address(i);
>> 	}
>>
>>
>>> +	}
>>> +
>>>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>>>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>>>  
>>> -		if (strcmp(namebuf, name) == 0)
>>> -			return kallsyms_sym_address(i);
>>> -
>>>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>>>  			return kallsyms_sym_address(i);
>>
>> Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
>> It might actually slow down the search because we would need to use
>> both fast and slow search?
> 
> Theoretically, I don't think so. A string comparison was removed from the
> slow search. "if (name_len != len)" is faster than
> "if (strcmp(namebuf, name) == 0)". Even if they're equal,
> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
> overall picture. The average lookup time before optimization is
> millisecond-level. To allay your concerns, I can run a test.
> 
> Before: min=170,  max=15949190, avg=7553906
> 
>>
>> Best Regards,
>> Petr
>> .
>>
>
Zhen Lei Sept. 22, 2022, 7:21 a.m. UTC | #5
On 2022/9/22 15:02, Petr Mladek wrote:
> On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/9/21 23:25, Petr Mladek wrote:
>>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
>>>> Currently, to search for a symbol, we need to expand the symbols in
>>>> 'kallsyms_names' one by one, and then use the expanded string for
>>>> comparison. This process can be optimized.
>>>>
>>>> And now scripts/kallsyms no longer compresses the symbol types, each
>>>> symbol type always occupies one byte. So we can first compress the
>>>> searched symbol and then make a quick comparison based on the compressed
>>>> length and content. In this way, for entries with mismatched lengths,
>>>> there is no need to expand and compare strings. And for those matching
>>>> lengths, there's no need to expand the symbol. This saves a lot of time.
>>>> According to my test results, the average performance of
>>>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>>>
>>>> The pseudo code of the test case is as follows:
>>>> static int stat_find_name(...)
>>>> {
>>>> 	start = sched_clock();
>>>> 	(void)kallsyms_lookup_name(name);
>>>> 	end = sched_clock();
>>>> 	//Update min, max, cnt, sum
>>>> }
>>>>
>>>> /*
>>>>  * Traverse all symbols in sequence and collect statistics on the time
>>>>  * taken by kallsyms_lookup_name() to lookup each symbol.
>>>>  */
>>>> kallsyms_on_each_symbol(stat_find_name, NULL);
>>>>
>>>> The test results are as follows (twice):
>>>> After : min=5250, max=  726560, avg= 302132
>>>> After : min=5320, max=  726850, avg= 301978
>>>> Before: min=170,  max=15949190, avg=7553906
>>>> Before: min=160,  max=15877280, avg=7517784
>>>>
>>>> The average time consumed is only 4.01% and the maximum time consumed is
>>>> only 4.57% of the time consumed before optimization.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>> ---
>>>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>>>>  1 file changed, 76 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
>>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
>>>> --- a/kernel/kallsyms.c
>>>> +++ b/kernel/kallsyms.c
>>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>>>> +{
>>>> +	int i, j, k, n;
>>>> +	int len, token_len;
>>>> +	const char *token;
>>>> +	unsigned char token_idx[KSYM_NAME_LEN];
>>>> +	unsigned char token_bak[KSYM_NAME_LEN];
>>>
>>> Why do we need two buffers? It should be possible to compress the name
>>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
>>
>> Because the performance would be a little better. Now this function takes
>> just over a microsecond. Currently, it takes about 250 microseconds on
>> average to lookup a symbol, so adding a little more time to this function
>> doesn't affect the overall picture. I'll modify and test it as you suggest
>> below.
> 
> We need to be careful about a stack overflow. I have seen that
> KSYM_NAME_LEN might need to be increased to 512 because of
> Rust support, see
> https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org

OK. Thanks for your information. I decided to add kallsyms_best_token_table[],
kallsyms_best_token_table_len, so that we only need one namebuf[], like
kallsyms_expand_symbol().

> 
>>>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>>>>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>>>>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>>>>  
>>>> -		if (strcmp(namebuf, name) == 0)
>>>> -			return kallsyms_sym_address(i);
>>>> -
>>>>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>>>>  			return kallsyms_sym_address(i);
>>>
>>> Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
>>> It might actually slow down the search because we would need to use
>>> both fast and slow search?
>>
>> Theoretically, I don't think so. A string comparison was removed from the
>> slow search. "if (name_len != len)" is faster than
>> "if (strcmp(namebuf, name) == 0)". Even if they're equal,
>> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
>> overall picture. The average lookup time before optimization is
>> millisecond-level.
>>
>> Before: min=170,  max=15949190, avg=7553906
> 
> Good point! I agree that the potential extra overhead is negligible
> when using the old code as a fallback.
> 
> Best Regards,
> Petr
> .
>
Petr Mladek Sept. 22, 2022, 1:17 p.m. UTC | #6
On Thu 2022-09-22 15:21:57, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/9/22 15:02, Petr Mladek wrote:
> > On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote:
> >>
> >>
> >> On 2022/9/21 23:25, Petr Mladek wrote:
> >>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
> >>>> Currently, to search for a symbol, we need to expand the symbols in
> >>>> 'kallsyms_names' one by one, and then use the expanded string for
> >>>> comparison. This process can be optimized.
> >>>>
> >>>> And now scripts/kallsyms no longer compresses the symbol types, each
> >>>> symbol type always occupies one byte. So we can first compress the
> >>>> searched symbol and then make a quick comparison based on the compressed
> >>>> length and content. In this way, for entries with mismatched lengths,
> >>>> there is no need to expand and compare strings. And for those matching
> >>>> lengths, there's no need to expand the symbol. This saves a lot of time.
> >>>> According to my test results, the average performance of
> >>>> kallsyms_lookup_name() can be improved by 20 to 30 times.
> >>>>
> >>>> The pseudo code of the test case is as follows:
> >>>> static int stat_find_name(...)
> >>>> {
> >>>> 	start = sched_clock();
> >>>> 	(void)kallsyms_lookup_name(name);
> >>>> 	end = sched_clock();
> >>>> 	//Update min, max, cnt, sum
> >>>> }
> >>>>
> >>>> /*
> >>>>  * Traverse all symbols in sequence and collect statistics on the time
> >>>>  * taken by kallsyms_lookup_name() to lookup each symbol.
> >>>>  */
> >>>> kallsyms_on_each_symbol(stat_find_name, NULL);
> >>>>
> >>>> The test results are as follows (twice):
> >>>> After : min=5250, max=  726560, avg= 302132
> >>>> After : min=5320, max=  726850, avg= 301978
> >>>> Before: min=170,  max=15949190, avg=7553906
> >>>> Before: min=160,  max=15877280, avg=7517784
> >>>>
> >>>> The average time consumed is only 4.01% and the maximum time consumed is
> >>>> only 4.57% of the time consumed before optimization.
> >>>>
> >>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >>>> ---
> >>>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
> >>>>  1 file changed, 76 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> >>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
> >>>> --- a/kernel/kallsyms.c
> >>>> +++ b/kernel/kallsyms.c
> >>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
> >>>> +{
> >>>> +	int i, j, k, n;
> >>>> +	int len, token_len;
> >>>> +	const char *token;
> >>>> +	unsigned char token_idx[KSYM_NAME_LEN];
> >>>> +	unsigned char token_bak[KSYM_NAME_LEN];
> >>>
> >>> Why do we need two buffers? It should be possible to compress the name
> >>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
> >>
> >> Because the performance would be a little better. Now this function takes
> >> just over a microsecond. Currently, it takes about 250 microseconds on
> >> average to lookup a symbol, so adding a little more time to this function
> >> doesn't affect the overall picture. I'll modify and test it as you suggest
> >> below.
> > 
> > We need to be careful about a stack overflow. I have seen that
> > KSYM_NAME_LEN might need to be increased to 512 because of
> > Rust support, see
> > https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org
> 
> OK. Thanks for your information. I decided to add kallsyms_best_token_table[],
> kallsyms_best_token_table_len, so that we only need one namebuf[], like
> kallsyms_expand_symbol().

Thanks for the effort. Adding kallsyms_best_token_table[] sounds like
the right solution.

Best Regards,
Petr
Zhen Lei Sept. 28, 2022, 1:35 a.m. UTC | #7
On 2022/9/22 15:02, Petr Mladek wrote:
> On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/9/21 23:25, Petr Mladek wrote:
>>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
>>>> Currently, to search for a symbol, we need to expand the symbols in
>>>> 'kallsyms_names' one by one, and then use the expanded string for
>>>> comparison. This process can be optimized.
>>>>
>>>> And now scripts/kallsyms no longer compresses the symbol types, each
>>>> symbol type always occupies one byte. So we can first compress the
>>>> searched symbol and then make a quick comparison based on the compressed
>>>> length and content. In this way, for entries with mismatched lengths,
>>>> there is no need to expand and compare strings. And for those matching
>>>> lengths, there's no need to expand the symbol. This saves a lot of time.
>>>> According to my test results, the average performance of
>>>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>>>
>>>> The pseudo code of the test case is as follows:
>>>> static int stat_find_name(...)
>>>> {
>>>> 	start = sched_clock();
>>>> 	(void)kallsyms_lookup_name(name);
>>>> 	end = sched_clock();
>>>> 	//Update min, max, cnt, sum
>>>> }
>>>>
>>>> /*
>>>>  * Traverse all symbols in sequence and collect statistics on the time
>>>>  * taken by kallsyms_lookup_name() to lookup each symbol.
>>>>  */
>>>> kallsyms_on_each_symbol(stat_find_name, NULL);
>>>>
>>>> The test results are as follows (twice):
>>>> After : min=5250, max=  726560, avg= 302132
>>>> After : min=5320, max=  726850, avg= 301978
>>>> Before: min=170,  max=15949190, avg=7553906
>>>> Before: min=160,  max=15877280, avg=7517784
>>>>
>>>> The average time consumed is only 4.01% and the maximum time consumed is
>>>> only 4.57% of the time consumed before optimization.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>> ---
>>>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>>>>  1 file changed, 76 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
>>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
>>>> --- a/kernel/kallsyms.c
>>>> +++ b/kernel/kallsyms.c
>>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>>>> +{
>>>> +	int i, j, k, n;
>>>> +	int len, token_len;
>>>> +	const char *token;
>>>> +	unsigned char token_idx[KSYM_NAME_LEN];
>>>> +	unsigned char token_bak[KSYM_NAME_LEN];
>>>
>>> Why do we need two buffers? It should be possible to compress the name
>>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
>>
>> Because the performance would be a little better. Now this function takes
>> just over a microsecond. Currently, it takes about 250 microseconds on
>> average to lookup a symbol, so adding a little more time to this function
>> doesn't affect the overall picture. I'll modify and test it as you suggest
>> below.
> 
> We need to be careful about a stack overflow. I have seen that
> KSYM_NAME_LEN might need to be increased to 512 because of
> Rust support, see
> https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org
> 
>>>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>>>>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>>>>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>>>>  
>>>> -		if (strcmp(namebuf, name) == 0)
>>>> -			return kallsyms_sym_address(i);
>>>> -
>>>>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>>>>  			return kallsyms_sym_address(i);
>>>
>>> Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
>>> It might actually slow down the search because we would need to use
>>> both fast and slow search?
>>
>> Theoretically, I don't think so. A string comparison was removed from the
>> slow search. "if (name_len != len)" is faster than
>> "if (strcmp(namebuf, name) == 0)". Even if they're equal,
>> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
>> overall picture. The average lookup time before optimization is
>> millisecond-level.
>>
>> Before: min=170,  max=15949190, avg=7553906
> 
> Good point! I agree that the potential extra overhead is negligible
> when using the old code as a fallback.

These days sleep better. When I got up this morning, my subconscious told me that
compiled LLVM could also be optimized. In fact, the method is simple, that is,
check whether the next token starts with '.' or '$' after being expanded.

I will post v7 before the holidays.

> 
> Best Regards,
> Petr
> .
>
Zhen Lei Sept. 30, 2022, 11:37 a.m. UTC | #8
On 2022/9/28 9:35, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/9/22 15:02, Petr Mladek wrote:
>> On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2022/9/21 23:25, Petr Mladek wrote:
>>>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
>>>>> Currently, to search for a symbol, we need to expand the symbols in
>>>>> 'kallsyms_names' one by one, and then use the expanded string for
>>>>> comparison. This process can be optimized.
>>>>>
>>>>> And now scripts/kallsyms no longer compresses the symbol types, each
>>>>> symbol type always occupies one byte. So we can first compress the
>>>>> searched symbol and then make a quick comparison based on the compressed
>>>>> length and content. In this way, for entries with mismatched lengths,
>>>>> there is no need to expand and compare strings. And for those matching
>>>>> lengths, there's no need to expand the symbol. This saves a lot of time.
>>>>> According to my test results, the average performance of
>>>>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>>>>
>>>>> The pseudo code of the test case is as follows:
>>>>> static int stat_find_name(...)
>>>>> {
>>>>> 	start = sched_clock();
>>>>> 	(void)kallsyms_lookup_name(name);
>>>>> 	end = sched_clock();
>>>>> 	//Update min, max, cnt, sum
>>>>> }
>>>>>
>>>>> /*
>>>>>  * Traverse all symbols in sequence and collect statistics on the time
>>>>>  * taken by kallsyms_lookup_name() to lookup each symbol.
>>>>>  */
>>>>> kallsyms_on_each_symbol(stat_find_name, NULL);
>>>>>
>>>>> The test results are as follows (twice):
>>>>> After : min=5250, max=  726560, avg= 302132
>>>>> After : min=5320, max=  726850, avg= 301978
>>>>> Before: min=170,  max=15949190, avg=7553906
>>>>> Before: min=160,  max=15877280, avg=7517784
>>>>>
>>>>> The average time consumed is only 4.01% and the maximum time consumed is
>>>>> only 4.57% of the time consumed before optimization.
>>>>>
>>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>>> ---
>>>>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>>>>>  1 file changed, 76 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
>>>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
>>>>> --- a/kernel/kallsyms.c
>>>>> +++ b/kernel/kallsyms.c
>>>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>>>>> +{
>>>>> +	int i, j, k, n;
>>>>> +	int len, token_len;
>>>>> +	const char *token;
>>>>> +	unsigned char token_idx[KSYM_NAME_LEN];
>>>>> +	unsigned char token_bak[KSYM_NAME_LEN];
>>>>
>>>> Why do we need two buffers? It should be possible to compress the name
>>>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
>>>
>>> Because the performance would be a little better. Now this function takes
>>> just over a microsecond. Currently, it takes about 250 microseconds on
>>> average to lookup a symbol, so adding a little more time to this function
>>> doesn't affect the overall picture. I'll modify and test it as you suggest
>>> below.
>>
>> We need to be careful about a stack overflow. I have seen that
>> KSYM_NAME_LEN might need to be increased to 512 because of
>> Rust support, see
>> https://lore.kernel.org/r/20220805154231.31257-6-ojeda@kernel.org
>>
>>>>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>>>>>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>>>>>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>>>>>  
>>>>> -		if (strcmp(namebuf, name) == 0)
>>>>> -			return kallsyms_sym_address(i);
>>>>> -
>>>>>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>>>>>  			return kallsyms_sym_address(i);
>>>>
>>>> Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
>>>> It might actually slow down the search because we would need to use
>>>> both fast and slow search?
>>>
>>> Theoretically, I don't think so. A string comparison was removed from the
>>> slow search. "if (name_len != len)" is faster than
>>> "if (strcmp(namebuf, name) == 0)". Even if they're equal,
>>> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
>>> overall picture. The average lookup time before optimization is
>>> millisecond-level.
>>>
>>> Before: min=170,  max=15949190, avg=7553906
>>
>> Good point! I agree that the potential extra overhead is negligible
>> when using the old code as a fallback.
> 
> These days sleep better. When I got up this morning, my subconscious told me that
> compiled LLVM could also be optimized. In fact, the method is simple, that is,
> check whether the next token starts with '.' or '$' after being expanded.
> 
> I will post v7 before the holidays.

Sorry, I'm going to break my promise. A lot of code needs to be modified on
the tool side, to make sure that the first '.' or '$' will not be in the middle
or end of the expanded substring. The lab is powered off, so I can only post v7
after the holidays (one week).

> 
>>
>> Best Regards,
>> Petr
>> .
>>
>
diff mbox series

Patch

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -87,6 +87,71 @@  static unsigned int kallsyms_expand_symbol(unsigned int off,
 	return off;
 }
 
+static int kallsyms_name_to_tokens(const char *name, char *buf)
+{
+	int i, j, k, n;
+	int len, token_len;
+	const char *token;
+	unsigned char token_idx[KSYM_NAME_LEN];
+	unsigned char token_bak[KSYM_NAME_LEN];
+
+	/*
+	 * n, number of tokens in the string name.
+	 * token_idx[i], the start index of the ith token.
+	 * token_idx[n] is used to calculate the length of the last token.
+	 */
+	n = strlen(name);
+	if (n >= KSYM_NAME_LEN) {
+		buf[0] = 0;
+		return 0;
+	}
+	for (i = 0; i <= n; i++)
+		token_idx[i] = (unsigned char)i;
+
+	/*
+	 * For tokens whose token_len >= 2, a larger index value indicates
+	 * a higher occurrence frequency. See scripts/kallsyms.c
+	 */
+	for (i = 255; i >= 0; i--) {
+		token = &kallsyms_token_table[kallsyms_token_index[i]];
+		token_len = strlen(token);
+		if (token_len <= 1)
+			continue;
+
+		/*
+		 * Find and merge two tokens into one.
+		 *
+		 *                |<-- new_token -->|
+		 *                | token1 | token2 |
+		 * token_idx[]:   j       j+1      j+2
+		 *
+		 */
+		for (j = 0; j < n - 1; j++) {
+			len = token_idx[j + 2] - token_idx[j];
+			if (len == token_len &&
+			    !strncmp(name + token_idx[j], token, len)) {
+				token_bak[token_idx[j]] = (unsigned char)i;
+				for (k = j + 1; k < n; k++)
+					token_idx[k] = token_idx[k + 1];
+				n--;
+			}
+		}
+	}
+
+	for (j = 0; j < n; j++) {
+		len = token_idx[j + 1] - token_idx[j];
+		if (len <= 1) {
+			buf[j] = name[token_idx[j]];
+			continue;
+		}
+
+		buf[j] = token_bak[token_idx[j]];
+	}
+	buf[n] = 0;
+
+	return n;
+}
+
 /*
  * Get symbol type information. This is encoded as a single char at the
  * beginning of the symbol name.
@@ -192,20 +257,28 @@  unsigned long kallsyms_lookup_name(const char *name)
 	char namebuf[KSYM_NAME_LEN];
 	unsigned long i;
 	unsigned int off;
+	int len;
 
 	/* Skip the search for empty string. */
 	if (!*name)
 		return 0;
 
+	len = kallsyms_name_to_tokens(name, namebuf);
+	for (i = 0, off = 0; len && i < kallsyms_num_syms; i++) {
+		if (kallsyms_names[off] == len + 1 &&
+		    !memcmp(&kallsyms_names[off + 2], namebuf, len))
+			return kallsyms_sym_address(i);
+
+		off += kallsyms_names[off] + 1;
+	}
+
 	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
 		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
 
-		if (strcmp(namebuf, name) == 0)
-			return kallsyms_sym_address(i);
-
 		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
 			return kallsyms_sym_address(i);
 	}
+
 	return module_kallsyms_lookup_name(name);
 }