diff mbox series

[01/17] kallsyms: support big kernel symbols (2-byte lengths)

Message ID 20210704202756.29107-2-ojeda@kernel.org (mailing list archive)
State New, archived
Headers show
Series Rust support | expand

Commit Message

Miguel Ojeda July 4, 2021, 8:27 p.m. UTC
From: Miguel Ojeda <ojeda@kernel.org>

Rust symbols can become quite long due to namespacing introduced
by modules, types, traits, generics, etc.

Increasing to 255 is not enough in some cases, and therefore
we need to introduce 2-byte lengths to the symbol table. We call
these "big" symbols.

In order to avoid increasing all lengths to 2 bytes (since most
of them only require 1 byte, including many Rust ones), we use
length zero to mark "big" symbols in the table.

Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com>
Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com>
Co-developed-by: Geoffrey Thomas <geofft@ldpreload.com>
Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com>
Co-developed-by: Finn Behrens <me@kloenk.de>
Signed-off-by: Finn Behrens <me@kloenk.de>
Co-developed-by: Adam Bratschi-Kaye <ark.email@gmail.com>
Signed-off-by: Adam Bratschi-Kaye <ark.email@gmail.com>
Co-developed-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
---
 kernel/kallsyms.c  |  7 +++++++
 scripts/kallsyms.c | 31 ++++++++++++++++++++++++++++---
 2 files changed, 35 insertions(+), 3 deletions(-)

Comments

Linus Torvalds July 4, 2021, 8:52 p.m. UTC | #1
On Sun, Jul 4, 2021 at 1:28 PM <ojeda@kernel.org> wrote:
>
>
> +       /* If zero, it is a "big" symbol, so a two byte length follows. */
> +       if (len == 0) {
> +               len = (data[0] << 8) | data[1];
> +               data += 2;
> +               off += len + 2;
> +       }

Side note: it might be a good idea at this point to verify that "len" is >255.

Also, why is this in big-endian order?

Let's just try to kill big-endian data, it's disgusting and should
just die already.

BE is practically dead anyway, we shouldn't add new cases. Networking
has legacy reasons from the bad old days when byte order wars were
still a thing, but those days are gone.

           Linus
Matthew Wilcox July 4, 2021, 9:04 p.m. UTC | #2
On Sun, Jul 04, 2021 at 10:27:40PM +0200, ojeda@kernel.org wrote:
> From: Miguel Ojeda <ojeda@kernel.org>
> 
> Rust symbols can become quite long due to namespacing introduced
> by modules, types, traits, generics, etc.
> 
> Increasing to 255 is not enough in some cases, and therefore
> we need to introduce 2-byte lengths to the symbol table. We call
> these "big" symbols.
> 
> In order to avoid increasing all lengths to 2 bytes (since most
> of them only require 1 byte, including many Rust ones), we use
> length zero to mark "big" symbols in the table.

What happened to my suggestion from last time of encoding symbols < 128
as 0-127 and symbols larger than that as (data[0] - 128) * 256 +
data[1]) ?
Miguel Ojeda July 4, 2021, 9:15 p.m. UTC | #3
On Sun, Jul 4, 2021 at 10:52 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Also, why is this in big-endian order?

No particular reason. It makes sense to use LE -- I will change it.

Cheers,
Miguel
Miguel Ojeda July 4, 2021, 9:17 p.m. UTC | #4
On Sun, Jul 4, 2021 at 11:05 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> What happened to my suggestion from last time of encoding symbols < 128
> as 0-127 and symbols larger than that as (data[0] - 128) * 256 +
> data[1]) ?

Nothing, sorry, we focused on other parts (e.g. the allocation panics)
during this iteration. I can take a look for v2.

Cheers,
Miguel
Gary Guo July 4, 2021, 9:20 p.m. UTC | #5
On Sun, 4 Jul 2021 22:04:49 +0100
Matthew Wilcox <willy@infradead.org> wrote:

> On Sun, Jul 04, 2021 at 10:27:40PM +0200, ojeda@kernel.org wrote:
> > From: Miguel Ojeda <ojeda@kernel.org>
> > 
> > Rust symbols can become quite long due to namespacing introduced
> > by modules, types, traits, generics, etc.
> > 
> > Increasing to 255 is not enough in some cases, and therefore
> > we need to introduce 2-byte lengths to the symbol table. We call
> > these "big" symbols.
> > 
> > In order to avoid increasing all lengths to 2 bytes (since most
> > of them only require 1 byte, including many Rust ones), we use
> > length zero to mark "big" symbols in the table.  
> 
> What happened to my suggestion from last time of encoding symbols <
> 128 as 0-127 and symbols larger than that as (data[0] - 128) * 256 +
> data[1]) ?

Yeah, I agree ULEB128 or similar encoding scheme would be better than
using 0 as an escape byte. If ULEB128 is used and we restrict number of
bytes to 2, it will encode numbers up to 2**14 instead of 2**16 like the
current scheme, but that should be sufficient anyway.

- Gary
Linus Torvalds July 4, 2021, 9:28 p.m. UTC | #6
On Sun, Jul 4, 2021 at 2:15 PM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>
> No particular reason. It makes sense to use LE -- I will change it.

Matthew's suggestion is denser, which is nice.

At that point, it would be neither BE nor LE. But the "LE-like" version would be

   len = data[0];
   if (len & 128)
        len += data[1] << 7;

which ends up having a tiny bit more range (it goes to 11^H32895).

Of course, if the range is expected to be just 0-300, I guess that
matters not one whit.

                Linus
Matthew Wilcox July 4, 2021, 9:33 p.m. UTC | #7
On Sun, Jul 04, 2021 at 11:17:50PM +0200, Miguel Ojeda wrote:
> On Sun, Jul 4, 2021 at 11:05 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > What happened to my suggestion from last time of encoding symbols < 128
> > as 0-127 and symbols larger than that as (data[0] - 128) * 256 +
> > data[1]) ?
> 
> Nothing, sorry, we focused on other parts (e.g. the allocation panics)
> during this iteration. I can take a look for v2.

Here's what I have.  Build testing now.


diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index c851ca0ed357..0d45a6e5fdc3 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -67,6 +67,14 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
 	len = *data;
 	data++;
 
+	/* lengths larger than 128 are encoded as two bytes */
+	if (len >= 128) {
+		len -= 128;
+		len *= 256;
+		len += *data;
+		data++;
+	}
+
 	/*
 	 * Update the offset to return the offset for the next symbol on
 	 * the compressed stream.
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 54ad86d13784..701566e01a1d 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -467,10 +467,16 @@ static void write_src(void)
 	output_label("kallsyms_names");
 	off = 0;
 	for (i = 0; i < table_cnt; i++) {
+		int len = table[i]->len;
 		if ((i & 0xFF) == 0)
 			markers[i >> 8] = off;
 
-		printf("\t.byte 0x%02x", table[i]->len);
+		if (len >= 128) {
+			printf("\t.byte 0x%02x\n", len / 256 + 128);
+			len %= 256;
+			off++;
+		}
+		printf("\t.byte 0x%02x", len);
 		for (k = 0; k < table[i]->len; k++)
 			printf(", 0x%02x", table[i]->sym[k]);
 		printf("\n");
Matthew Wilcox July 4, 2021, 9:49 p.m. UTC | #8
On Sun, Jul 04, 2021 at 10:33:39PM +0100, Matthew Wilcox wrote:
> On Sun, Jul 04, 2021 at 11:17:50PM +0200, Miguel Ojeda wrote:
> > On Sun, Jul 4, 2021 at 11:05 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > What happened to my suggestion from last time of encoding symbols < 128
> > > as 0-127 and symbols larger than that as (data[0] - 128) * 256 +
> > > data[1]) ?
> > 
> > Nothing, sorry, we focused on other parts (e.g. the allocation panics)
> > during this iteration. I can take a look for v2.
> 
> Here's what I have.  Build testing now.

It seems to work.  Sample output from kallsyms with the test forced to
'true':

        .byte 0x80
        .byte 0x0a, 0x41, 0xff, 0x70, 0xf3, 0xd0, 0xb0, 0xf2, 0xfc, 0x72, 0x74

under normal circumstances, it omits that first line.

I'm unfortunately in the middle of ripping apart and fixing up my test
infrastructure, so I can't claim to have really tested it, and I don't
have a module with a really large symbol anyway.  But I'll submit this
to you as a proper patch with changelog if that's enough testing for you.

(now that i look at kallsyms, why on earth is it ripping the string
apart and turning it into .byte instead of just using .ascii?)

> 
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index c851ca0ed357..0d45a6e5fdc3 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -67,6 +67,14 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>  	len = *data;
>  	data++;
>  
> +	/* lengths larger than 128 are encoded as two bytes */
> +	if (len >= 128) {
> +		len -= 128;
> +		len *= 256;
> +		len += *data;
> +		data++;
> +	}
> +
>  	/*
>  	 * Update the offset to return the offset for the next symbol on
>  	 * the compressed stream.
> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index 54ad86d13784..701566e01a1d 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -467,10 +467,16 @@ static void write_src(void)
>  	output_label("kallsyms_names");
>  	off = 0;
>  	for (i = 0; i < table_cnt; i++) {
> +		int len = table[i]->len;
>  		if ((i & 0xFF) == 0)
>  			markers[i >> 8] = off;
>  
> -		printf("\t.byte 0x%02x", table[i]->len);
> +		if (len >= 128) {
> +			printf("\t.byte 0x%02x\n", len / 256 + 128);
> +			len %= 256;
> +			off++;
> +		}
> +		printf("\t.byte 0x%02x", len);
>  		for (k = 0; k < table[i]->len; k++)
>  			printf(", 0x%02x", table[i]->sym[k]);
>  		printf("\n");
Miguel Ojeda July 4, 2021, 10:07 p.m. UTC | #9
On Sun, Jul 4, 2021 at 11:49 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> I'm unfortunately in the middle of ripping apart and fixing up my test
> infrastructure, so I can't claim to have really tested it, and I don't
> have a module with a really large symbol anyway.  But I'll submit this
> to you as a proper patch with changelog if that's enough testing for you.

Sure, and thanks! I can give it a spin with our tests.

Cheers,
Miguel
Gary Guo July 4, 2021, 10:20 p.m. UTC | #10
On Sun, 4 Jul 2021 22:33:39 +0100
Matthew Wilcox <willy@infradead.org> wrote:

> On Sun, Jul 04, 2021 at 11:17:50PM +0200, Miguel Ojeda wrote:
> > On Sun, Jul 4, 2021 at 11:05 PM Matthew Wilcox
> > <willy@infradead.org> wrote:  
> > >
> > > What happened to my suggestion from last time of encoding symbols
> > > < 128 as 0-127 and symbols larger than that as (data[0] - 128) *
> > > 256 + data[1]) ?  
> > 
> > Nothing, sorry, we focused on other parts (e.g. the allocation
> > panics) during this iteration. I can take a look for v2.  
> 
> Here's what I have.  Build testing now.
> 
> 
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index c851ca0ed357..0d45a6e5fdc3 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -67,6 +67,14 @@ static unsigned int
> kallsyms_expand_symbol(unsigned int off, len = *data;
>  	data++;
>  
> +	/* lengths larger than 128 are encoded as two bytes */
> +	if (len >= 128) {
> +		len -= 128;
> +		len *= 256;
> +		len += *data;
> +		data++;
> +	}
> +
>  	/*
>  	 * Update the offset to return the offset for the next
> symbol on
>  	 * the compressed stream.
> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index 54ad86d13784..701566e01a1d 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -467,10 +467,16 @@ static void write_src(void)
>  	output_label("kallsyms_names");
>  	off = 0;
>  	for (i = 0; i < table_cnt; i++) {
> +		int len = table[i]->len;
>  		if ((i & 0xFF) == 0)
>  			markers[i >> 8] = off;
>  
> -		printf("\t.byte 0x%02x", table[i]->len);
> +		if (len >= 128) {
> +			printf("\t.byte 0x%02x\n", len / 256 + 128);
> +			len %= 256;
> +			off++;
> +		}
> +		printf("\t.byte 0x%02x", len);
>  		for (k = 0; k < table[i]->len; k++)
>  			printf(", 0x%02x", table[i]->sym[k]);
>  		printf("\n");

This is big endian.
Matthew Wilcox July 4, 2021, 10:42 p.m. UTC | #11
On Sun, Jul 04, 2021 at 11:20:07PM +0100, Gary Guo wrote:
> This is big endian.

Fundamentally, it doesn't matter whether it's encoded as top-7 +
bottom-8 or bottom-7 + top-8.  It could just as well be:

        if (len >= 128) {
                len -= 128;
                len += *data * 256;
                data++;
        }

It doesn't matter whether it's compatible with some other encoding.
This encoding has one producer and one consumer.  As long as they agree,
it's fine.  If you want to make an argument about extensibiity, then
I'm going to suggest that wanting a symbol name more than 32kB in size
is a sign you've done something else very, very wrong.

At that point, you should probably switch to comparing hashes of the
symbol instead of the symbol.  Indeed, I think we're already there at
300 byte symbols; we should probably SipHash the full, unmangled symbol
[1].  At 33k symbols in the current kernel, the risk of a collision of
a 64-bit value is negligible, and almost every kernel symbol is longer
than 7 bytes (thankfully).

[1] ie SipHash("void unlock_page(struct page *)")
Gary Guo July 4, 2021, 11:14 p.m. UTC | #12
On Sun, 4 Jul 2021 23:42:03 +0100
Matthew Wilcox <willy@infradead.org> wrote:

> On Sun, Jul 04, 2021 at 11:20:07PM +0100, Gary Guo wrote:
> > This is big endian.  
> 
> Fundamentally, it doesn't matter whether it's encoded as top-7 +
> bottom-8 or bottom-7 + top-8.  It could just as well be:
> 
>         if (len >= 128) {
>                 len -= 128;
>                 len += *data * 256;

Do you mean `*data * 128`?

>                 data++;
>         }
> 
> It doesn't matter whether it's compatible with some other encoding.
> This encoding has one producer and one consumer.  As long as they
> agree, it's fine.

I am aware that this is only for internal tooling so it doesn't
really matter. I mentioned that it's big endian to do top-7 +
bottom-8 because Linus suggests that big-endian shouldn't be
used.
Willy Tarreau July 5, 2021, 4:35 a.m. UTC | #13
On Sun, Jul 04, 2021 at 10:20:43PM +0100, Gary Guo wrote:
> On Sun, 4 Jul 2021 22:04:49 +0100
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > On Sun, Jul 04, 2021 at 10:27:40PM +0200, ojeda@kernel.org wrote:
> > > From: Miguel Ojeda <ojeda@kernel.org>
> > > 
> > > Rust symbols can become quite long due to namespacing introduced
> > > by modules, types, traits, generics, etc.
> > > 
> > > Increasing to 255 is not enough in some cases, and therefore
> > > we need to introduce 2-byte lengths to the symbol table. We call
> > > these "big" symbols.
> > > 
> > > In order to avoid increasing all lengths to 2 bytes (since most
> > > of them only require 1 byte, including many Rust ones), we use
> > > length zero to mark "big" symbols in the table.  
> > 
> > What happened to my suggestion from last time of encoding symbols <
> > 128 as 0-127 and symbols larger than that as (data[0] - 128) * 256 +
> > data[1]) ?
> 
> Yeah, I agree ULEB128 or similar encoding scheme would be better than
> using 0 as an escape byte. If ULEB128 is used and we restrict number of
> bytes to 2, it will encode numbers up to 2**14 instead of 2**16 like the
> current scheme, but that should be sufficient anyway.

Actually plenty of variants of such encodings exist. You can split the
first byte on 192 to keep 6 upper bits, 224 for 5, 240 for 4, etc. It
all depends how long the maximum string is expected to be and how often
we expect to see large strings. For example when splitting around 240,
all sizes from 0 to 239 take one byte, and sizes from 240 to 4335 take
two bytes.

But if strings >128 are already extremely rare we don't really care
about the extra byte needed to encode them.

Willy
Kent Overstreet July 13, 2021, 6:02 p.m. UTC | #14
On Sun, Jul 04, 2021 at 11:42:03PM +0100, Matthew Wilcox wrote:
> On Sun, Jul 04, 2021 at 11:20:07PM +0100, Gary Guo wrote:
> > This is big endian.
> 
> Fundamentally, it doesn't matter whether it's encoded as top-7 +
> bottom-8 or bottom-7 + top-8.  It could just as well be:
> 
>         if (len >= 128) {
>                 len -= 128;
>                 len += *data * 256;
>                 data++;
>         }
> 
> It doesn't matter whether it's compatible with some other encoding.
> This encoding has one producer and one consumer.  As long as they agree,
> it's fine.  If you want to make an argument about extensibiity, then
> I'm going to suggest that wanting a symbol name more than 32kB in size
> is a sign you've done something else very, very wrong.
> 
> At that point, you should probably switch to comparing hashes of the
> symbol instead of the symbol.  Indeed, I think we're already there at
> 300 byte symbols; we should probably SipHash the full, unmangled symbol
> [1].  At 33k symbols in the current kernel, the risk of a collision of
> a 64-bit value is negligible, and almost every kernel symbol is longer
> than 7 bytes (thankfully).

We really should have a better standard varint encoding - open coding varint
encodings in 2021 is offensive, and LEB128 is retarded due to using the high bit
of _every_ byte. Here's the encoding I did for bcachefs, which I nominate for a
standard varint encoding, unless someone knows of a way to do better:

https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/varint.c
diff mbox series

Patch

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index c851ca0ed35..9d0c23e1993 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -73,6 +73,13 @@  static unsigned int kallsyms_expand_symbol(unsigned int off,
 	 */
 	off += len + 1;
 
+	/* If zero, it is a "big" symbol, so a two byte length follows. */
+	if (len == 0) {
+		len = (data[0] << 8) | data[1];
+		data += 2;
+		off += len + 2;
+	}
+
 	/*
 	 * For every byte on the compressed symbol data, copy the table
 	 * entry for that byte.
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 54ad86d1378..bcdabee13aa 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -470,12 +470,37 @@  static void write_src(void)
 		if ((i & 0xFF) == 0)
 			markers[i >> 8] = off;
 
-		printf("\t.byte 0x%02x", table[i]->len);
+		/*
+		 * There cannot be any symbol of length zero -- we use that
+		 * to mark a "big" symbol (and it doesn't make sense anyway).
+		 */
+		if (table[i]->len == 0) {
+			fprintf(stderr, "kallsyms failure: "
+				"unexpected zero symbol length\n");
+			exit(EXIT_FAILURE);
+		}
+
+		/* Only lengths that fit in up to two bytes are supported. */
+		if (table[i]->len > 0xFFFF) {
+			fprintf(stderr, "kallsyms failure: "
+				"unexpected huge symbol length\n");
+			exit(EXIT_FAILURE);
+		}
+
+		if (table[i]->len <= 0xFF) {
+			/* Most symbols use a single byte for the length. */
+			printf("\t.byte 0x%02x", table[i]->len);
+			off += table[i]->len + 1;
+		} else {
+			/* "Big" symbols use a zero and then two bytes. */
+			printf("\t.byte 0x00, 0x%02x, 0x%02x",
+				(table[i]->len >> 8) & 0xFF,
+				table[i]->len & 0xFF);
+			off += table[i]->len + 3;
+		}
 		for (k = 0; k < table[i]->len; k++)
 			printf(", 0x%02x", table[i]->sym[k]);
 		printf("\n");
-
-		off += table[i]->len + 1;
 	}
 	printf("\n");