diff mbox series

[v4,3/8] scripts/kallsyms: don't compress symbol types

Message ID 20220920071317.1787-4-thunder.leizhen@huawei.com (mailing list archive)
State New, archived
Headers show
Series kallsyms: Optimizes the performance of lookup symbols | expand

Commit Message

Leizhen (ThunderTown) Sept. 20, 2022, 7:13 a.m. UTC
Currently, to search for a symbol, we need to expand the symbols in
'kallsyms_names' one by one, and then use the expanded string for
comparison. Because we do not know the symbol type, and the symbol type
may be combined with the following characters to form a token.

So if we don't compress the symbol type, we can first compress the
searched symbol and then make a quick comparison based on the compressed
length and content. In this way, for entries with mismatched lengths,
there is no need to expand and compare strings. And for those matching
lengths, there's no need to expand the symbol. This saves a lot of time.
According to my test results, the average performance of
kallsyms_lookup_name() can be improved by 20 to 30 times.

Of course, because the symbol type is forcibly not compressed, the
compression rate also decreases. Here are the test results with
defconfig:

arm64: <<<<<<
        ---------------------------------------------------------------
       | ALL | nr_symbols | compressed size | original size | ratio(%) |
        -----|---------------------------------------------------------|
Before |  Y  |     174094 |       1884938   |      3750653  |  50.25   |
After  |  Y  |     174099 |       1960154   |      3750756  |  52.26   |
Before |  N  |      61744 |        725507   |      1222737  |  59.33   |
After  |  N  |      61747 |        745733   |      1222801  |  60.98   |
        ---------------------------------------------------------------
The memory overhead is increased by:
  73.5KiB and 4.0% if CONFIG_KALLSYMS_ALL=y.
  19.8KiB and 2.8% if CONFIG_KALLSYMS_ALL=n.

x86: <<<<<<<<
       ---------------------------------------------------------------
       | ALL | nr_symbols | compressed size | original size | ratio(%) |
        -----|---------------------------------------------------------|
Before |  Y  |     131415 |       1697542   |      3161216  |  53.69   |
After  |  Y  |     131540 |       1747769   |      3163933  |  55.24   |
Before |  N  |      60695 |        737627   |      1283046  |  57.49   |
After  |  N  |      60699 |        754797   |      1283149  |  58.82   |
        ---------------------------------------------------------------
The memory overhead is increased by:
  49.0KiB and 3.0% if CONFIG_KALLSYMS_ALL=y.
  16.8KiB and 2.3% if CONFIG_KALLSYMS_ALL=n.

This additional memory overhead is worth it compared to the performance
improvement, I think.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 scripts/kallsyms.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Petr Mladek Sept. 21, 2022, 9 a.m. UTC | #1
On Tue 2022-09-20 15:13:12, Zhen Lei wrote:
> Currently, to search for a symbol, we need to expand the symbols in
> 'kallsyms_names' one by one, and then use the expanded string for
> comparison. Because we do not know the symbol type, and the symbol type
> may be combined with the following characters to form a token.
>
> So if we don't compress the symbol type, we can first compress the
> searched symbol and then make a quick comparison based on the compressed
> length and content. In this way, for entries with mismatched lengths,
> there is no need to expand and compare strings. And for those matching
> lengths, there's no need to expand the symbol. This saves a lot of time.
> According to my test results, the average performance of
> kallsyms_lookup_name() can be improved by 20 to 30 times.
>
> Of course, because the symbol type is forcibly not compressed, the
> compression rate also decreases. Here are the test results with
> defconfig:
> 
> arm64: <<<<<<
>         ---------------------------------------------------------------
>        | ALL | nr_symbols | compressed size | original size | ratio(%) |
>         -----|---------------------------------------------------------|
> Before |  Y  |     174094 |       1884938   |      3750653  |  50.25   |
> After  |  Y  |     174099 |       1960154   |      3750756  |  52.26   |
> Before |  N  |      61744 |        725507   |      1222737  |  59.33   |
> After  |  N  |      61747 |        745733   |      1222801  |  60.98   |
>         ---------------------------------------------------------------
> The memory overhead is increased by:
>   73.5KiB and 4.0% if CONFIG_KALLSYMS_ALL=y.
>   19.8KiB and 2.8% if CONFIG_KALLSYMS_ALL=n.
> 
> x86: <<<<<<<<
>        ---------------------------------------------------------------
>        | ALL | nr_symbols | compressed size | original size | ratio(%) |
>         -----|---------------------------------------------------------|
> Before |  Y  |     131415 |       1697542   |      3161216  |  53.69   |
> After  |  Y  |     131540 |       1747769   |      3163933  |  55.24   |
> Before |  N  |      60695 |        737627   |      1283046  |  57.49   |
> After  |  N  |      60699 |        754797   |      1283149  |  58.82   |
>         ---------------------------------------------------------------
> The memory overhead is increased by:
>   49.0KiB and 3.0% if CONFIG_KALLSYMS_ALL=y.
>   16.8KiB and 2.3% if CONFIG_KALLSYMS_ALL=n.
>
> This additional memory overhead is worth it compared to the performance
> improvement, I think.

I agree. The speedup mentioned in the followup patches looks big.
I just suggest to do this change a cleaner way, see below.

> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  scripts/kallsyms.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index 3319d9f38d7a5f2..1ae9ce773d2a31d 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -61,6 +61,15 @@ static int all_symbols;
>  static int absolute_percpu;
>  static int base_relative;
>  
> +/*
> + * Each entry in the symbol table consists of the symbol type and the symbol
> + * itself. To optimize the performance of finding or traversing symbols in
> + * kernel, do not compress the symbol type. In this way, when looking for a
> + * symbol of unknown type, we can first compress the searched symbol and then
> + * make a quick comparison based on the compressed length and content.
> + */
> +static int sym_start_idx = 1;
> +
>  static int token_profit[0x10000];
>  
>  /* the table that holds the result of the compression */
> @@ -511,7 +520,7 @@ static void learn_symbol(const unsigned char *symbol, int len)
>  {
>  	int i;
>  
> -	for (i = 0; i < len - 1; i++)
> +	for (i = sym_start_idx; i < len - 1; i++)

It creates yet another twists in scripts/kallsyms.c. read_symbol()
explicitely adds the type as the first character so that it can be
compressed. And this patch adds a hack to skip it.

Let's do it a clean way and store the type serarately:

struct sym_entry {
	unsigned long long addr;
	unsigned int len;
	unsigned int start_pos;
	unsigned int percpu_absolute;
	unsigned char type;
	unsigned char name[];
};

static struct sym_entry *read_symbol(FILE *in)
{
[...]
	name_len = strlen(name);

	sym = malloc(sizeof(*sym) + name_len);
	if (!sym) {
		fprintf(stderr, "kallsyms failure: "
			"unable to allocate required amount of memory\n");
		exit(EXIT_FAILURE);
	}
	sym->addr = addr;
	sym->len = name_len;
	sym->type = type;
	strcpy(sys->name, name);
	sym->percpu_absolute = 0;
}

It would allow to remove the tricky:

	static char *sym_name(const struct sym_entry *s)
	{
		return (char *)s->sym + 1;
	}

and access s->name directly.

OK, the problem is how to store the type. The clean way would be
to put it into a separate section, for example:

static void write_src(void)
{
[...]
	output_label("kallsyms_types");
	off = 0;
	for (i = 0; i < table_cnt; i++) {
		printf("\t.byte 0x%02x\n", table[i]->type);
	}
	printf("\n");
[...]
}

It would probably increase the size even more. Another problem
is that it would need changes in the crash dump tools, see:

static int __init crash_save_vmcoreinfo_init(void)
{
[...]
	VMCOREINFO_SYMBOL(kallsyms_names);
[...]
}

A solution would be to store it the old way:

static void write_src(void)
{
[...]
	output_label("kallsyms_names");
	off = 0;
	for (i = 0; i < table_cnt; i++) {
		if ((i & 0xFF) == 0)
			markers[i >> 8] = off;

		/*
		 * Store the symbol type togerher with symbol name.
		 * It helps to reduce the size.
		 */
		printf("\t.byte 0x%02x", table[i]->len + 1);
		printf(", 0x%02x", table[i]->type);
		for (k = 0; k < table[i]->len; k++)
			printf(", 0x%02x", table[i]->sym[k]);
		printf("\n");

		/* symbol name lenght + type + "\n" */
		off += table[i]->len + 2;
	}
	printf("\n");
[...]
}

The result would be the same as with your patch. But the code would be
even cleaner than before.

Best Regards,
Petr
Leizhen (ThunderTown) Sept. 21, 2022, 1:13 p.m. UTC | #2
On 2022/9/21 17:00, Petr Mladek wrote:
> On Tue 2022-09-20 15:13:12, Zhen Lei wrote:
>> Currently, to search for a symbol, we need to expand the symbols in
>> 'kallsyms_names' one by one, and then use the expanded string for
>> comparison. Because we do not know the symbol type, and the symbol type
>> may be combined with the following characters to form a token.
>>
>> So if we don't compress the symbol type, we can first compress the
>> searched symbol and then make a quick comparison based on the compressed
>> length and content. In this way, for entries with mismatched lengths,
>> there is no need to expand and compare strings. And for those matching
>> lengths, there's no need to expand the symbol. This saves a lot of time.
>> According to my test results, the average performance of
>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>
>> Of course, because the symbol type is forcibly not compressed, the
>> compression rate also decreases. Here are the test results with
>> defconfig:
>>
>> arm64: <<<<<<
>>         ---------------------------------------------------------------
>>        | ALL | nr_symbols | compressed size | original size | ratio(%) |
>>         -----|---------------------------------------------------------|
>> Before |  Y  |     174094 |       1884938   |      3750653  |  50.25   |
>> After  |  Y  |     174099 |       1960154   |      3750756  |  52.26   |
>> Before |  N  |      61744 |        725507   |      1222737  |  59.33   |
>> After  |  N  |      61747 |        745733   |      1222801  |  60.98   |
>>         ---------------------------------------------------------------
>> The memory overhead is increased by:
>>   73.5KiB and 4.0% if CONFIG_KALLSYMS_ALL=y.
>>   19.8KiB and 2.8% if CONFIG_KALLSYMS_ALL=n.
>>
>> x86: <<<<<<<<
>>        ---------------------------------------------------------------
>>        | ALL | nr_symbols | compressed size | original size | ratio(%) |
>>         -----|---------------------------------------------------------|
>> Before |  Y  |     131415 |       1697542   |      3161216  |  53.69   |
>> After  |  Y  |     131540 |       1747769   |      3163933  |  55.24   |
>> Before |  N  |      60695 |        737627   |      1283046  |  57.49   |
>> After  |  N  |      60699 |        754797   |      1283149  |  58.82   |
>>         ---------------------------------------------------------------
>> The memory overhead is increased by:
>>   49.0KiB and 3.0% if CONFIG_KALLSYMS_ALL=y.
>>   16.8KiB and 2.3% if CONFIG_KALLSYMS_ALL=n.
>>
>> This additional memory overhead is worth it compared to the performance
>> improvement, I think.
> 
> I agree. The speedup mentioned in the followup patches looks big.
> I just suggest to do this change a cleaner way, see below.
> 
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  scripts/kallsyms.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
>> index 3319d9f38d7a5f2..1ae9ce773d2a31d 100644
>> --- a/scripts/kallsyms.c
>> +++ b/scripts/kallsyms.c
>> @@ -61,6 +61,15 @@ static int all_symbols;
>>  static int absolute_percpu;
>>  static int base_relative;
>>  
>> +/*
>> + * Each entry in the symbol table consists of the symbol type and the symbol
>> + * itself. To optimize the performance of finding or traversing symbols in
>> + * kernel, do not compress the symbol type. In this way, when looking for a
>> + * symbol of unknown type, we can first compress the searched symbol and then
>> + * make a quick comparison based on the compressed length and content.
>> + */
>> +static int sym_start_idx = 1;
>> +
>>  static int token_profit[0x10000];
>>  
>>  /* the table that holds the result of the compression */
>> @@ -511,7 +520,7 @@ static void learn_symbol(const unsigned char *symbol, int len)
>>  {
>>  	int i;
>>  
>> -	for (i = 0; i < len - 1; i++)
>> +	for (i = sym_start_idx; i < len - 1; i++)
> 
> It creates yet another twists in scripts/kallsyms.c. read_symbol()
> explicitely adds the type as the first character so that it can be
> compressed. And this patch adds a hack to skip it.
> 
> Let's do it a clean way and store the type serarately:
> 
> struct sym_entry {
> 	unsigned long long addr;
> 	unsigned int len;
> 	unsigned int start_pos;
> 	unsigned int percpu_absolute;
> 	unsigned char type;

Yes, it's very necessary. Thanks.

> 	unsigned char name[];

Yes, using "name[]" will be clearer than using "sym[]"

> };
> 
> static struct sym_entry *read_symbol(FILE *in)
> {
> [...]
> 	name_len = strlen(name);
> 
> 	sym = malloc(sizeof(*sym) + name_len);
> 	if (!sym) {
> 		fprintf(stderr, "kallsyms failure: "
> 			"unable to allocate required amount of memory\n");
> 		exit(EXIT_FAILURE);
> 	}
> 	sym->addr = addr;
> 	sym->len = name_len;
> 	sym->type = type;
> 	strcpy(sys->name, name);
> 	sym->percpu_absolute = 0;
> }
> 
> It would allow to remove the tricky:
> 
> 	static char *sym_name(const struct sym_entry *s)
> 	{
> 		return (char *)s->sym + 1;
> 	}
> 
> and access s->name directly.

OK

> 
> OK, the problem is how to store the type. The clean way would be
> to put it into a separate section, for example:
> 
> static void write_src(void)
> {
> [...]
> 	output_label("kallsyms_types");
> 	off = 0;
> 	for (i = 0; i < table_cnt; i++) {
> 		printf("\t.byte 0x%02x\n", table[i]->type);
> 	}
> 	printf("\n");
> [...]
> }
> 
> It would probably increase the size even more. Another problem
> is that it would need changes in the crash dump tools, see:
> 
> static int __init crash_save_vmcoreinfo_init(void)
> {
> [...]
> 	VMCOREINFO_SYMBOL(kallsyms_names);
> [...]
> }
> 
> A solution would be to store it the old way:

I'll use this compatibility mode first. Because I'm going to compress
the type later.

> 
> static void write_src(void)
> {
> [...]
> 	output_label("kallsyms_names");
> 	off = 0;
> 	for (i = 0; i < table_cnt; i++) {
> 		if ((i & 0xFF) == 0)
> 			markers[i >> 8] = off;
> 
> 		/*
> 		 * Store the symbol type togerher with symbol name.
> 		 * It helps to reduce the size.
> 		 */
> 		printf("\t.byte 0x%02x", table[i]->len + 1);
> 		printf(", 0x%02x", table[i]->type);
> 		for (k = 0; k < table[i]->len; k++)
> 			printf(", 0x%02x", table[i]->sym[k]);
> 		printf("\n");
> 
> 		/* symbol name lenght + type + "\n" */
> 		off += table[i]->len + 2;
> 	}
> 	printf("\n");
> [...]
> }
> 
> The result would be the same as with your patch. But the code would be
> even cleaner than before.

Yes,Thank you for your valuable comments.

> 
> Best Regards,
> Petr
> .
>
diff mbox series

Patch

diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 3319d9f38d7a5f2..1ae9ce773d2a31d 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -61,6 +61,15 @@  static int all_symbols;
 static int absolute_percpu;
 static int base_relative;
 
+/*
+ * Each entry in the symbol table consists of the symbol type and the symbol
+ * itself. To optimize the performance of finding or traversing symbols in
+ * kernel, do not compress the symbol type. In this way, when looking for a
+ * symbol of unknown type, we can first compress the searched symbol and then
+ * make a quick comparison based on the compressed length and content.
+ */
+static int sym_start_idx = 1;
+
 static int token_profit[0x10000];
 
 /* the table that holds the result of the compression */
@@ -511,7 +520,7 @@  static void learn_symbol(const unsigned char *symbol, int len)
 {
 	int i;
 
-	for (i = 0; i < len - 1; i++)
+	for (i = sym_start_idx; i < len - 1; i++)
 		token_profit[ symbol[i] + (symbol[i + 1] << 8) ]++;
 }
 
@@ -520,7 +529,7 @@  static void forget_symbol(const unsigned char *symbol, int len)
 {
 	int i;
 
-	for (i = 0; i < len - 1; i++)
+	for (i = sym_start_idx; i < len - 1; i++)
 		token_profit[ symbol[i] + (symbol[i + 1] << 8) ]--;
 }
 
@@ -538,7 +547,7 @@  static unsigned char *find_token(unsigned char *str, int len,
 {
 	int i;
 
-	for (i = 0; i < len - 1; i++) {
+	for (i = sym_start_idx; i < len - 1; i++) {
 		if (str[i] == token[0] && str[i+1] == token[1])
 			return &str[i];
 	}