Message ID | 20210414184604.23473-2-ojeda@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Rust support | expand |
On Wed, Apr 14, 2021 at 08:45:52PM +0200, ojeda@kernel.org wrote: > Increasing to 255 is not enough in some cases, and therefore > we need to introduce 2-byte lengths to the symbol table. We call > these "big" symbols. > > In order to avoid increasing all lengths to 2 bytes (since most > of them only require 1 byte, including many Rust ones), we use > length zero to mark "big" symbols in the table. How about doing something a bit more utf-8-like? len = data[0]; if (len == 0) error else if (len < 128) return len; else if (len < 192) return 128 + (len - 128) * 256 + data[1]; ... that takes you all the way out to 16511 bytes. You probably don't even need the third byte option. But if you do ... else if (len < 223) return 16512 + (len - 192) * 256 * 256 + data[1] * 256 + data[2]; which takes you all the way out to 2,113,663 bytes and leaves 224-255 unused. Alternatively, if the symbols are really this long, perhaps we should not do string matches. A sha-1 (... or whatever ...) hash of the function name is 160 bits. Expressed as hex digits, that's 40 characters. Expressed in base-64, it's 27 characters. We'd also want a "pretty" name to go along with the hash, but that seems preferable to printing out a mangled-with-types-and-who-knows-what name. > Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com> > Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com> If you have C-d-b, you don't also need S-o-b.
On Wed, Apr 14, 2021 at 9:45 PM Matthew Wilcox <willy@infradead.org> wrote: > > How about doing something a bit more utf-8-like? > > len = data[0]; > if (len == 0) > error > else if (len < 128) > return len; > else if (len < 192) > return 128 + (len - 128) * 256 + data[1]; > ... that takes you all the way out to 16511 bytes. You probably don't That would save some space and allow us to keep the 0 as an error, yeah. > Alternatively, if the symbols are really this long, perhaps we should not > do string matches. A sha-1 (... or whatever ...) hash of the function > name is 160 bits. Expressed as hex digits, that's 40 characters. > Expressed in base-64, it's 27 characters. We'd also want a "pretty" > name to go along with the hash, but that seems preferable to printing > out a mangled-with-types-and-who-knows-what name. I have seen symbols up to ~300, but I don't think we will ever go up to more than, say, 1024, unless we start to go crazy with generics, namespaces and what not. Hashing could be a nice solution if they really grow, yeah. > If you have C-d-b, you don't also need S-o-b. Hmm... `submitting-patches.rst` keeps the S-o-b in the example they give, is it outdated? Cheers, Miguel
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 8043a90aa50e..faba546e9a58 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -73,6 +73,13 @@ static unsigned int kallsyms_expand_symbol(unsigned int off, */ off += len + 1; + /* If zero, it is a "big" symbol, so a two byte length follows. */ + if (len == 0) { + len = (data[0] << 8) | data[1]; + data += 2; + off += len + 2; + } + /* * For every byte on the compressed symbol data, copy the table * entry for that byte. diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c index 54ad86d13784..bcdabee13aab 100644 --- a/scripts/kallsyms.c +++ b/scripts/kallsyms.c @@ -470,12 +470,37 @@ static void write_src(void) if ((i & 0xFF) == 0) markers[i >> 8] = off; - printf("\t.byte 0x%02x", table[i]->len); + /* + * There cannot be any symbol of length zero -- we use that + * to mark a "big" symbol (and it doesn't make sense anyway). + */ + if (table[i]->len == 0) { + fprintf(stderr, "kallsyms failure: " + "unexpected zero symbol length\n"); + exit(EXIT_FAILURE); + } + + /* Only lengths that fit in up to two bytes are supported. */ + if (table[i]->len > 0xFFFF) { + fprintf(stderr, "kallsyms failure: " + "unexpected huge symbol length\n"); + exit(EXIT_FAILURE); + } + + if (table[i]->len <= 0xFF) { + /* Most symbols use a single byte for the length. */ + printf("\t.byte 0x%02x", table[i]->len); + off += table[i]->len + 1; + } else { + /* "Big" symbols use a zero and then two bytes. */ + printf("\t.byte 0x00, 0x%02x, 0x%02x", + (table[i]->len >> 8) & 0xFF, + table[i]->len & 0xFF); + off += table[i]->len + 3; + } for (k = 0; k < table[i]->len; k++) printf(", 0x%02x", table[i]->sym[k]); printf("\n"); - - off += table[i]->len + 1; } printf("\n");