Message ID | 20241226164957.5cab9f2d@gandalf.local.home (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [POC,RFC] build: Make weak functions visible in kallsyms | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Thu, 26 Dec 2024 at 13:49, Steven Rostedt <rostedt@goodmis.org> wrote: > > But then, when the linker removes these functions because they were > overridden, the code does not disappear, leaving the pointers in the > __mcount_loc locations. This seems entirely unrelated to weak functions, and will be true for any other "linker removed it" (which can happen for other reasons too). So your "fix" seems to be hacking around a symptom. And honestly, the kallsyms argument seems bogus too. The problem with kallsyms is that it looks up the size the wrong way. Making up new function names doesn't fix the problem, it - once again - just hacks around the symptom of doing something wrong. Christ, kallsyms looking at nm output and going by "next symbol" was always bogus, but I think that's how the old a.out format worked originally. But "nm" literally takes a "-S" argument. We just don't use it. So I think the fix is literally to just make kallsysms have the size data. Of course, very annoyingly out /proc/kallsyms file format also tracks the (legacy) nm output that doesn't have size information. But I do think that if you hit real problems, you need to fix the *source* of the issue, not add another ugly hack around things. Linus
On Thu, 26 Dec 2024 15:01:07 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, 26 Dec 2024 at 13:49, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > But then, when the linker removes these functions because they were > > overridden, the code does not disappear, leaving the pointers in the > > __mcount_loc locations. > > This seems entirely unrelated to weak functions, and will be true for > any other "linker removed it" (which can happen for other reasons > too). > > So your "fix" seems to be hacking around a symptom. Yeah, that's why this was just a POC. > > And honestly, the kallsyms argument seems bogus too. The problem with > kallsyms is that it looks up the size the wrong way. Making up new > function names doesn't fix the problem, it - once again - just hacks > around the symptom of doing something wrong. > > Christ, kallsyms looking at nm output and going by "next symbol" was > always bogus, but I think that's how the old a.out format worked > originally. > > But "nm" literally takes a "-S" argument. We just don't use it. > > So I think the fix is literally to just make kallsysms have the size > data. Of course, very annoyingly out /proc/kallsyms file format also > tracks the (legacy) nm output that doesn't have size information. > > But I do think that if you hit real problems, you need to fix the > *source* of the issue, not add another ugly hack around things. So basically the real solution is to fix kallsyms to know about the end of functions. Peter Zijlstra mentioned that before, but it would take a bit more work and understanding of kallsyms to fix it properly. I'm fine not doing the hack and hopefully one day someone will have the time to fix kallsyms. This was just something I could do quickly, knowing it is mostly keeping with the status quo and not actually fixing the root of the issue. I also needed to refresh my ELF parsing abilities ;-) I may take a look at kallsyms internals. I have some spare time before the new year to try and work on things I don't normally get time to work on. -- Steve
On Fri, Dec 27, 2024 at 11:19 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 26 Dec 2024 15:01:07 -0800 > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 26 Dec 2024 at 13:49, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > But then, when the linker removes these functions because they were > > > overridden, the code does not disappear, leaving the pointers in the > > > __mcount_loc locations. > > > > This seems entirely unrelated to weak functions, and will be true for > > any other "linker removed it" (which can happen for other reasons > > too). > > > > So your "fix" seems to be hacking around a symptom. > > Yeah, that's why this was just a POC. > > > > > And honestly, the kallsyms argument seems bogus too. The problem with > > kallsyms is that it looks up the size the wrong way. Making up new > > function names doesn't fix the problem, it - once again - just hacks > > around the symptom of doing something wrong. > > > > Christ, kallsyms looking at nm output and going by "next symbol" was > > always bogus, but I think that's how the old a.out format worked > > originally. > > > > But "nm" literally takes a "-S" argument. We just don't use it. > > > > So I think the fix is literally to just make kallsysms have the size > > data. Of course, very annoyingly out /proc/kallsyms file format also > > tracks the (legacy) nm output that doesn't have size information. > > > > But I do think that if you hit real problems, you need to fix the > > *source* of the issue, not add another ugly hack around things. > > So basically the real solution is to fix kallsyms to know about the end > of functions. Peter Zijlstra mentioned that before, but it would take a > bit more work and understanding of kallsyms to fix it properly. > > I'm fine not doing the hack and hopefully one day someone will have the > time to fix kallsyms. This was just something I could do quickly, > knowing it is mostly keeping with the status quo and not actually > fixing the root of the issue. I also needed to refresh my ELF parsing > abilities ;-) > > I may take a look at kallsyms internals. I have some spare time before > the new year to try and work on things I don't normally get time to > work on. > > -- Steve So, does the kallsyms patch set from Zheng Yejian fix the issues? It was a long time ago that I reviewed his v1. I need to take a look if we decide to go with that approach.
On Thu, 26 Dec 2024 at 18:19, Steven Rostedt <rostedt@goodmis.org> wrote: > > So basically the real solution is to fix kallsyms to know about the end > of functions. Peter Zijlstra mentioned that before, but it would take a > bit more work and understanding of kallsyms to fix it properly. Yeah. The kallsyms code *used* to be pretty simple - really just a list of symbols and addresses. But it took up a lot of memory, so back in the day (two decades ago by now) it started growing some name compression code, and then some serious speedups for lookup. See https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e10392112d315c45f054c22c862e3a7ae27d17d4 for when it went from basically a very simple array of names to be a lot less obvious with that table lookup name compression (but it had _some_ name compression even before that). That said, I think it's really mainly just the name compression that is a bit obscure and looks odd, and it's only used for the builtin kernel symbols (module symbols are still just a per-module array of "const Elf_Sym *"). And you can actually largely ignore the odd name compression - because the *rest* is fairly straightforward. For example, the actual offset of the symbol is simply a plain array still: kallsyms_offsets[]. It's slightly oddly encoded (see kallsyms_sym_address() if you care), but that's because it's an array of 32-bit values used to encode kernel symbol offsets that can obviously be 64-bit. Encoding the size of the symbols should be trivial: just add another array: "kallsyms_sizes[]", and it wouldn't even need that odd encoding. So I actually think it *should* be fairly straightforward to do for anybnody who knows the kallsyms code at all. The main pain-point would be *if* we want to actually expose the sizes in /proc/kallsyms. That would be a file format change. Which we can't do, so we'd have to do it as some kind of strange ioctl setting (or just add a new file under a new name). But maybe we don't even need that. If all the uses are in-kernel, just adding the kallsyms_sizes[] (and accessor helper functions) should be fairly straightforward. Of course, I say that without having done it. I might be overlooking something. Linus
On Thu, 26 Dec 2024 at 18:52, Masahiro Yamada <masahiroy@kernel.org> wrote: > > So, does the kallsyms patch set from Zheng Yejian fix the issues? [ Me goes searching ] Oh. Instead of adding size information, it seems to add fake "hole" entries to the kallsyms. And I guess that works too. It does feel a bit hacky, but at least it's integrated with the kallsyms logic. Linus
On Thu, 26 Dec 2024 at 13:49, Steven Rostedt <rostedt@goodmis.org> wrote: > > But then, when the linker removes these functions because they were > overridden, the code does not disappear, leaving the pointers in the > __mcount_loc locations. Btw, does this actually happen when the compiler does the mcount thing for us? It *feels* like this might be a bug in the FTRACE_MCOUNT_USE_OBJTOOL logic interacting with --gc-setions. Is the problem that --gc-sections removed the function section that contained the weak function that was then never used, but the objtool thing with create_mcount_loc_sections() would generate that mcount_loc_list and nothing realized that it's no longer there? Or does it happen even with the compiler-generated case (ie with the -mrecord-mcount and FTRACE_MCOUNT_USE_CC)? We can disable LD_DEAD_CODE_DATA_ELIMINATION, if that's what triggers it. It's marked as experimental, and it does smell like either --gc-sections is buggy, or we're doing something wrong to trigger it (and I could easily see objtool rewriting object files being that trigger...) Linus
On Thu, 26 Dec 2024 19:35:18 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, 26 Dec 2024 at 13:49, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > But then, when the linker removes these functions because they were > > overridden, the code does not disappear, leaving the pointers in the > > __mcount_loc locations. > > Btw, does this actually happen when the compiler does the mcount thing for us? Yes. I believe the issue is that the mcount_loc is created during the compile phase, and it just points to the call to fentry/mcount. The linker phase doesn't remove the code, just the symbols that are overridden. That means the pointer to the fentry/mcount calls still point to the same locations, as the code is still there. I even sent an email about this to the gcc folks, and Peter responded basically explaining the above. https://lore.kernel.org/all/20241014210841.5a4764de@gandalf.local.home/ -- Steve
diff --git a/scripts/Makefile b/scripts/Makefile index 6bcda4b9d054..c1336c0baf6e 100644 --- a/scripts/Makefile +++ b/scripts/Makefile @@ -5,6 +5,7 @@ hostprogs-always-$(CONFIG_KALLSYMS) += kallsyms hostprogs-always-$(BUILD_C_RECORDMCOUNT) += recordmcount +hostprogs-always-y += weakfuncs hostprogs-always-$(CONFIG_BUILDTIME_TABLE_SORT) += sorttable hostprogs-always-$(CONFIG_ASN1) += asn1_compiler hostprogs-always-$(CONFIG_MODULE_SIG_FORMAT) += sign-file diff --git a/scripts/Makefile.build b/scripts/Makefile.build index c16e4cf54d77..0b6c3b9de28b 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -148,6 +148,19 @@ cmd_gen_symversions_c = $(call gen_symversions,c) endif +# Due to recursion, we must skip empty.o. +# The empty.o file is created in the make process in order to determine +# the target endianness and word size. It is made before all other C +# files, including recordmcount. +cmd_weakfuncs = \ + if [ $(@) != "scripts/mod/empty.o" ]; then \ + $(objtree)/scripts/weakfuncs "$(@)"; \ + fi; +weakfuncs_source := $(srctree)/scripts/weakfuncs.c +$(obj)/%.o: $(obj)/%.c $(weakfuncs_source) FORCE + $(call if_changed_rule,cc_o_c) + $(call cmd,force_checksrc) + ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT # compiler will not generate __mcount_loc use recordmcount or recordmcount.pl ifdef BUILD_C_RECORDMCOUNT diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index 7395200538da..c20659251914 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -321,6 +321,7 @@ define rule_cc_o_c $(call cmd,gen_objtooldep) $(call cmd,gen_symversions_c) $(call cmd,record_mcount) + $(call cmd,weakfuncs) $(call cmd,warn_shared_object) endef diff --git a/scripts/weakfuncs.c b/scripts/weakfuncs.c new file mode 100644 index 000000000000..4051720dde49 --- /dev/null +++ b/scripts/weakfuncs.c @@ -0,0 +1,489 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include <stdbool.h> +#include <string.h> +#include <stdlib.h> +#include <stdio.h> +#include <unistd.h> +#include <fcntl.h> +#include <elf.h> + +#include <sys/types.h> +#include <sys/mman.h> +#include <sys/stat.h> + +#define APPEND_STR "__weak__" + +/* Will append APPEND_STR to the start and end */ +#define APPEND_STR_SIZE 16 + +#define ALIGN(x, y) (((x) + (y) - 1) / (y) * (y)) + +typedef union { + Elf32_Ehdr e32; + Elf64_Ehdr e64; +} Elf_Ehdr; + +typedef union { + Elf32_Shdr s32; + Elf64_Shdr s64; +} Elf_Shdr; + +typedef union { + Elf32_Sym s32; + Elf64_Sym s64; +} Elf_Sym; + +struct elf { + void *map; + Elf_Ehdr *ehdr; + Elf_Shdr *shsym; + Elf_Shdr *shstr; + const char *shstrings; + const char *strings; + char *newstrings; + off_t file_size; + off_t newsym; + off_t symend; + off_t newstrstart; + size_t origstrsize; + size_t newstrend; + size_t newstrsize; + int fd; + bool is_32; +}; + +static inline int e_shnum(struct elf *elf) +{ + if (elf->is_32) + return elf->ehdr->e32.e_shnum; + else + return elf->ehdr->e64.e_shnum; +} + +static inline int e_shstrndx(struct elf *elf) +{ + if (elf->is_32) + return elf->ehdr->e32.e_shstrndx; + else + return elf->ehdr->e64.e_shstrndx; +} + +static inline int sh_type(struct elf *elf, Elf_Shdr *shdr) +{ + if (elf->is_32) + return shdr->s32.sh_type; + else + return shdr->s64.sh_type; +} + +static inline size_t sh_offset(struct elf *elf, Elf_Shdr *shdr) +{ + if (elf->is_32) + return shdr->s32.sh_offset; + else + return shdr->s64.sh_offset; +} + +static inline size_t sh_size(struct elf *elf, Elf_Shdr *shdr) +{ + if (elf->is_32) + return shdr->s32.sh_size; + else + return shdr->s64.sh_size; +} + +static inline size_t sh_entsize(struct elf *elf, Elf_Shdr *shdr) +{ + if (elf->is_32) + return shdr->s32.sh_entsize; + else + return shdr->s64.sh_entsize; +} + +static inline const char *sh_name(struct elf *elf, Elf_Shdr *shdr) +{ + if (elf->is_32) + return elf->shstrings + shdr->s32.sh_name; + else + return elf->shstrings + shdr->s64.sh_name; +} + +static inline const char *sym_name(struct elf *elf, Elf_Sym *sym) +{ + if (elf->is_32) + return elf->strings + sym->s32.st_name; + else + return elf->strings + sym->s64.st_name; +} + +static inline int sym_bind(struct elf *elf, Elf_Sym *sym) +{ + if (elf->is_32) + return ELF32_ST_BIND(sym->s32.st_info); + else + return ELF64_ST_BIND(sym->s64.st_info); +} + +static inline size_t sym_value(struct elf *elf, Elf_Sym *sym) +{ + if (elf->is_32) + return sym->s32.st_value; + else + return sym->s64.st_value; +} + +static inline int sym_size(struct elf *elf, Elf_Sym *sym) +{ + if (elf->is_32) + return sym->s32.st_size; + else + return sym->s64.st_size; +} + +static inline int sym_shndx(struct elf *elf, Elf_Sym *sym) +{ + if (elf->is_32) + return sym->s32.st_shndx; + else + return sym->s64.st_shndx; +} + +static inline Elf_Shdr *get_shdr(struct elf *elf, int index) +{ + if (elf->is_32) { + return elf->map + elf->ehdr->e32.e_shoff + + (index * elf->ehdr->e32.e_shentsize); + } else { + return elf->map + elf->ehdr->e64.e_shoff + + (index * elf->ehdr->e64.e_shentsize); + } +} + +static void update_sym_bind(struct elf *elf, Elf_Sym *sym) +{ + int t; + + if (elf->is_32) { + t = ELF32_ST_TYPE(sym->s32.st_info); + sym->s32.st_info = ELF32_ST_INFO(STB_GLOBAL, t); + } else { + t = ELF64_ST_TYPE(sym->s64.st_info); + sym->s64.st_info = ELF64_ST_INFO(STB_GLOBAL, t); + } +} + +static void update_sym_name(struct elf *elf, Elf_Sym *sym, size_t idx) +{ + if (elf->is_32) + sym->s32.st_name = idx; + else + sym->s64.st_name = idx; +} + +static Elf_Shdr *find_sym_table(struct elf *elf) +{ + Elf_Shdr *shdr; + + for (int i = 0; i < e_shnum(elf); i++) { + shdr = get_shdr(elf, i); + if (sh_type(elf, shdr) == SHT_SYMTAB) + return shdr; + } + return NULL; +} + +static const char *find_strings(struct elf *elf) +{ + Elf_Shdr *shdr; + + shdr = get_shdr(elf, e_shstrndx(elf)); + elf->shstrings = elf->map + sh_offset(elf, shdr); + + for (int i = 0; i < e_shnum(elf); i++) { + shdr = get_shdr(elf, i); + if (sh_type(elf, shdr) != SHT_STRTAB) + continue; + + if (!strcmp(sh_name(elf, shdr), ".strtab")) { + elf->shstr = shdr; + elf->origstrsize = sh_size(elf, shdr); + return elf->map + sh_offset(elf, shdr); + } + } + return NULL; +} + +static int add_string(struct elf *elf, const char *str) +{ + size_t size; + int len = strlen(str); + char *newstr; + + size = elf->newstrsize + len + APPEND_STR_SIZE + 1; + newstr = realloc(elf->newstrings, size); + if (!newstr) + return -1; + + elf->newstrings = newstr; + + newstr += elf->newstrsize; + + memcpy(newstr, APPEND_STR, APPEND_STR_SIZE / 2); + newstr += APPEND_STR_SIZE / 2; + + memcpy(newstr, str, len); + newstr += len; + + memcpy(newstr, APPEND_STR, APPEND_STR_SIZE / 2 + 1); + + elf->newstrsize += len + APPEND_STR_SIZE + 1; + + return 0; +} + +static int process_weak_func(struct elf *elf, Elf_Sym *sym) +{ + static off_t symsize; + static void *buf; + const char *str; + Elf_Shdr *shdr; + off_t end; + ssize_t r; + + shdr = get_shdr(elf, sym_shndx(elf, sym)); + /* Make sure the section is executable */ + if (sh_type(elf, shdr) != SHT_PROGBITS) + return 0; + + if (!symsize) { + symsize = sh_entsize(elf, elf->shsym); + buf = calloc(1, symsize); + if (!buf) { + perror("Allocated temp buffer"); + return -1; + } + } + + if (!elf->newsym) { + void *symtab; + size_t size; + + /* copy the symbol table */ + end = elf->file_size; + /* Make aligned by symbol entry size */ + end = ALIGN(end, symsize); + + if (lseek(elf->fd, end, SEEK_SET) != end) { + perror("lseek"); + return -1; + } + + elf->newsym = end; + symtab = elf->map + sh_offset(elf, elf->shsym); + size = sh_size(elf, elf->shsym); + r = write(elf->fd, symtab, size); + if (r != size) { + perror("Copying symtable"); + return -1; + } + + /* Make sure it's still aligned */ + end = lseek(elf->fd, 0, SEEK_END); + if (end < 0) { + perror("Getting new end of file"); + return -1; + } + + elf->symend = end; + end = ALIGN(end, symsize); + if (end != elf->symend) { + /* Add padding */ + r = write(elf->fd, buf, end - elf->symend); + free(buf); + if (r < 0) { + perror("Adding padding"); + return -1; + } + elf->symend = end; + } + elf->newstrend = elf->origstrsize; + } + + memcpy(buf, sym, symsize); + sym = buf; + update_sym_bind(elf, sym); + str = sym_name(elf, sym); + update_sym_name(elf, sym, elf->newstrend); + elf->newstrend += strlen(str) + APPEND_STR_SIZE + 1; + + if (add_string(elf, str) < 0) { + perror("Failed adding new string"); + return -1; + } + + r = write(elf->fd, sym, symsize); + if (r < 0) { + perror("Creating new symbol"); + return -1; + } + elf->symend += symsize; + + return 0; +} + +static int add_new_strings(struct elf *elf) +{ + off_t end; + size_t r; + + end = lseek(elf->fd, 0, SEEK_END); + if (end < 0) { + perror("lseek"); + return -1; + } + elf->newstrstart = end; + + r = write(elf->fd, elf->strings, elf->origstrsize); + if (r != elf->origstrsize) { + perror("Appending copy of string section"); + return -1; + } + + r = write(elf->fd, elf->newstrings, elf->newstrsize); + if (r != elf->newstrsize) { + perror("Appending new strings to section"); + return -1; + } + + end = lseek(elf->fd, 0, SEEK_END); + if (end < 0) { + perror("lseek"); + return -1; + } + elf->newstrend = end; + + return 0; +} + +static void update_shdr(struct elf *elf, Elf_Shdr *shdr, + size_t offset, size_t size) +{ + if (elf->is_32) { + shdr->s32.sh_offset = offset; + shdr->s32.sh_size = size; + } else { + shdr->s64.sh_offset = offset; + shdr->s64.sh_size = size; + } +} + +static void update_elf(struct elf *elf) +{ + update_shdr(elf, elf->shsym, elf->newsym, elf->symend - elf->newsym); + update_shdr(elf, elf->shstr, elf->newstrstart, + elf->newstrend - elf->newstrstart); +} + +static int add_weak_funcs(struct elf *elf) +{ + Elf_Shdr *sh_sym = find_sym_table(elf); + Elf_Sym *sym; + void *end_sym; + + if (!sh_sym) { + fprintf(stderr, "Could not find symbol table\n"); + return -1; + } + + elf->shsym = sh_sym; + elf->strings = find_strings(elf); + if (!elf->strings) { + fprintf(stderr, "Could not find string table\n"); + return -1; + } + + sym = elf->map + sh_offset(elf, sh_sym); + end_sym = elf->map + sh_offset(elf, sh_sym) + sh_size(elf, sh_sym); + + while ((void *)sym < end_sym) { + if (sym_bind(elf, sym) == STB_WEAK) { + if (process_weak_func(elf, sym) < 0) + return -1; + } + sym = (void *)sym + sh_entsize(elf, sh_sym); + } + + if (elf->newstrings) { + if (add_new_strings(elf) < 0) + return -1; + update_elf(elf); + } + + return 0; +} + +static int handle_weak_funcs(const char *file) +{ + struct elf elf = {}; + struct stat st; + void *map; + int ret = -1; + int fd; + + if (stat(file, &st) < 0) { + perror(file); + return -1; + } + + fd = open(file, O_RDWR); + if (fd < 0) { + perror(file); + return -1; + } + + map = mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); + if (map == MAP_FAILED) { + perror("mmap"); + return -1; + } + + elf.map = map; + elf.ehdr = map; + elf.fd = fd; + elf.file_size = st.st_size; + + switch (elf.ehdr->e32.e_machine) { + case EM_386: + elf.is_32 = true; + break; + case EM_X86_64: + elf.is_32 = false; + break; + default: + fprintf(stderr, "ELF type %d is unsupported\n", elf.ehdr->e32.e_machine); + goto out; + } + + ret = add_weak_funcs(&elf); + ret = 0; + out: + munmap(map, st.st_size); + close(fd); + return ret; +} + +int main(int argc, char **argv) +{ + if (argc < 2) { + fprintf(stderr, "usage: weakfunc file.o ...\n"); + exit(-1); + } + + for (int i = 1; i < argc; i++) { + int ret = handle_weak_funcs(argv[i]); + + if (ret < 0) + exit(ret); + } +}