diff mbox series

[01/13] kallsyms: Support "big" kernel symbols (2-byte lengths)

Message ID 20210414184604.23473-2-ojeda@kernel.org (mailing list archive)
State New, archived
Headers show
Series Rust support | expand

Commit Message

Miguel Ojeda April 14, 2021, 6:45 p.m. UTC
From: Miguel Ojeda <ojeda@kernel.org>

Rust symbols can become quite long due to namespacing introduced
by modules, types, traits, generics, etc.

Increasing to 255 is not enough in some cases, and therefore
we need to introduce 2-byte lengths to the symbol table. We call
these "big" symbols.

In order to avoid increasing all lengths to 2 bytes (since most
of them only require 1 byte, including many Rust ones), we use
length zero to mark "big" symbols in the table.

Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com>
Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com>
Co-developed-by: Geoffrey Thomas <geofft@ldpreload.com>
Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com>
Co-developed-by: Finn Behrens <me@kloenk.de>
Signed-off-by: Finn Behrens <me@kloenk.de>
Co-developed-by: Adam Bratschi-Kaye <ark.email@gmail.com>
Signed-off-by: Adam Bratschi-Kaye <ark.email@gmail.com>
Co-developed-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
---
 kernel/kallsyms.c  |  7 +++++++
 scripts/kallsyms.c | 31 ++++++++++++++++++++++++++++---
 2 files changed, 35 insertions(+), 3 deletions(-)

Comments

Matthew Wilcox April 14, 2021, 7:44 p.m. UTC | #1
On Wed, Apr 14, 2021 at 08:45:52PM +0200, ojeda@kernel.org wrote:
> Increasing to 255 is not enough in some cases, and therefore
> we need to introduce 2-byte lengths to the symbol table. We call
> these "big" symbols.
> 
> In order to avoid increasing all lengths to 2 bytes (since most
> of them only require 1 byte, including many Rust ones), we use
> length zero to mark "big" symbols in the table.

How about doing something a bit more utf-8-like?

	len = data[0];
	if (len == 0)
		error
	else if (len < 128)
		return len;
	else if (len < 192)
		return 128 + (len - 128) * 256 + data[1];
... that takes you all the way out to 16511 bytes.  You probably don't
even need the third byte option.  But if you do ...
	else if (len < 223)
		return 16512 + (len - 192) * 256 * 256 +
			data[1] * 256 + data[2];
which takes you all the way out to 2,113,663 bytes and leaves 224-255 unused.

Alternatively, if the symbols are really this long, perhaps we should not
do string matches.  A sha-1 (... or whatever ...) hash of the function
name is 160 bits.  Expressed as hex digits, that's 40 characters.
Expressed in base-64, it's 27 characters.  We'd also want a "pretty"
name to go along with the hash, but that seems preferable to printing
out a mangled-with-types-and-who-knows-what name.

> Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com>
> Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com>

If you have C-d-b, you don't also need S-o-b.
Miguel Ojeda April 14, 2021, 7:59 p.m. UTC | #2
On Wed, Apr 14, 2021 at 9:45 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> How about doing something a bit more utf-8-like?
>
>         len = data[0];
>         if (len == 0)
>                 error
>         else if (len < 128)
>                 return len;
>         else if (len < 192)
>                 return 128 + (len - 128) * 256 + data[1];
> ... that takes you all the way out to 16511 bytes.  You probably don't

That would save some space and allow us to keep the 0 as an error, yeah.

> Alternatively, if the symbols are really this long, perhaps we should not
> do string matches.  A sha-1 (... or whatever ...) hash of the function
> name is 160 bits.  Expressed as hex digits, that's 40 characters.
> Expressed in base-64, it's 27 characters.  We'd also want a "pretty"
> name to go along with the hash, but that seems preferable to printing
> out a mangled-with-types-and-who-knows-what name.

I have seen symbols up to ~300, but I don't think we will ever go up
to more than, say, 1024, unless we start to go crazy with generics,
namespaces and what not.

Hashing could be a nice solution if they really grow, yeah.

> If you have C-d-b, you don't also need S-o-b.

Hmm... `submitting-patches.rst` keeps the S-o-b in the example they
give, is it outdated?

Cheers,
Miguel
diff mbox series

Patch

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 8043a90aa50e..faba546e9a58 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -73,6 +73,13 @@  static unsigned int kallsyms_expand_symbol(unsigned int off,
 	 */
 	off += len + 1;
 
+	/* If zero, it is a "big" symbol, so a two byte length follows. */
+	if (len == 0) {
+		len = (data[0] << 8) | data[1];
+		data += 2;
+		off += len + 2;
+	}
+
 	/*
 	 * For every byte on the compressed symbol data, copy the table
 	 * entry for that byte.
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 54ad86d13784..bcdabee13aab 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -470,12 +470,37 @@  static void write_src(void)
 		if ((i & 0xFF) == 0)
 			markers[i >> 8] = off;
 
-		printf("\t.byte 0x%02x", table[i]->len);
+		/*
+		 * There cannot be any symbol of length zero -- we use that
+		 * to mark a "big" symbol (and it doesn't make sense anyway).
+		 */
+		if (table[i]->len == 0) {
+			fprintf(stderr, "kallsyms failure: "
+				"unexpected zero symbol length\n");
+			exit(EXIT_FAILURE);
+		}
+
+		/* Only lengths that fit in up to two bytes are supported. */
+		if (table[i]->len > 0xFFFF) {
+			fprintf(stderr, "kallsyms failure: "
+				"unexpected huge symbol length\n");
+			exit(EXIT_FAILURE);
+		}
+
+		if (table[i]->len <= 0xFF) {
+			/* Most symbols use a single byte for the length. */
+			printf("\t.byte 0x%02x", table[i]->len);
+			off += table[i]->len + 1;
+		} else {
+			/* "Big" symbols use a zero and then two bytes. */
+			printf("\t.byte 0x00, 0x%02x, 0x%02x",
+				(table[i]->len >> 8) & 0xFF,
+				table[i]->len & 0xFF);
+			off += table[i]->len + 3;
+		}
 		for (k = 0; k < table[i]->len; k++)
 			printf(", 0x%02x", table[i]->sym[k]);
 		printf("\n");
-
-		off += table[i]->len + 1;
 	}
 	printf("\n");