Message ID | 20241226164957.5cab9f2d@gandalf.local.home (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [POC,RFC] build: Make weak functions visible in kallsyms | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Thu, 26 Dec 2024 at 13:49, Steven Rostedt <rostedt@goodmis.org> wrote: > > But then, when the linker removes these functions because they were > overridden, the code does not disappear, leaving the pointers in the > __mcount_loc locations. This seems entirely unrelated to weak functions, and will be true for any other "linker removed it" (which can happen for other reasons too). So your "fix" seems to be hacking around a symptom. And honestly, the kallsyms argument seems bogus too. The problem with kallsyms is that it looks up the size the wrong way. Making up new function names doesn't fix the problem, it - once again - just hacks around the symptom of doing something wrong. Christ, kallsyms looking at nm output and going by "next symbol" was always bogus, but I think that's how the old a.out format worked originally. But "nm" literally takes a "-S" argument. We just don't use it. So I think the fix is literally to just make kallsysms have the size data. Of course, very annoyingly out /proc/kallsyms file format also tracks the (legacy) nm output that doesn't have size information. But I do think that if you hit real problems, you need to fix the *source* of the issue, not add another ugly hack around things. Linus
On Thu, 26 Dec 2024 15:01:07 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, 26 Dec 2024 at 13:49, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > But then, when the linker removes these functions because they were > > overridden, the code does not disappear, leaving the pointers in the > > __mcount_loc locations. > > This seems entirely unrelated to weak functions, and will be true for > any other "linker removed it" (which can happen for other reasons > too). > > So your "fix" seems to be hacking around a symptom. Yeah, that's why this was just a POC. > > And honestly, the kallsyms argument seems bogus too. The problem with > kallsyms is that it looks up the size the wrong way. Making up new > function names doesn't fix the problem, it - once again - just hacks > around the symptom of doing something wrong. > > Christ, kallsyms looking at nm output and going by "next symbol" was > always bogus, but I think that's how the old a.out format worked > originally. > > But "nm" literally takes a "-S" argument. We just don't use it. > > So I think the fix is literally to just make kallsysms have the size > data. Of course, very annoyingly out /proc/kallsyms file format also > tracks the (legacy) nm output that doesn't have size information. > > But I do think that if you hit real problems, you need to fix the > *source* of the issue, not add another ugly hack around things. So basically the real solution is to fix kallsyms to know about the end of functions. Peter Zijlstra mentioned that before, but it would take a bit more work and understanding of kallsyms to fix it properly. I'm fine not doing the hack and hopefully one day someone will have the time to fix kallsyms. This was just something I could do quickly, knowing it is mostly keeping with the status quo and not actually fixing the root of the issue. I also needed to refresh my ELF parsing abilities ;-) I may take a look at kallsyms internals. I have some spare time before the new year to try and work on things I don't normally get time to work on. -- Steve
On Fri, Dec 27, 2024 at 11:19 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 26 Dec 2024 15:01:07 -0800 > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 26 Dec 2024 at 13:49, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > But then, when the linker removes these functions because they were > > > overridden, the code does not disappear, leaving the pointers in the > > > __mcount_loc locations. > > > > This seems entirely unrelated to weak functions, and will be true for > > any other "linker removed it" (which can happen for other reasons > > too). > > > > So your "fix" seems to be hacking around a symptom. > > Yeah, that's why this was just a POC. > > > > > And honestly, the kallsyms argument seems bogus too. The problem with > > kallsyms is that it looks up the size the wrong way. Making up new > > function names doesn't fix the problem, it - once again - just hacks > > around the symptom of doing something wrong. > > > > Christ, kallsyms looking at nm output and going by "next symbol" was > > always bogus, but I think that's how the old a.out format worked > > originally. > > > > But "nm" literally takes a "-S" argument. We just don't use it. > > > > So I think the fix is literally to just make kallsysms have the size > > data. Of course, very annoyingly out /proc/kallsyms file format also > > tracks the (legacy) nm output that doesn't have size information. > > > > But I do think that if you hit real problems, you need to fix the > > *source* of the issue, not add another ugly hack around things. > > So basically the real solution is to fix kallsyms to know about the end > of functions. Peter Zijlstra mentioned that before, but it would take a > bit more work and understanding of kallsyms to fix it properly. > > I'm fine not doing the hack and hopefully one day someone will have the > time to fix kallsyms. This was just something I could do quickly, > knowing it is mostly keeping with the status quo and not actually > fixing the root of the issue. I also needed to refresh my ELF parsing > abilities ;-) > > I may take a look at kallsyms internals. I have some spare time before > the new year to try and work on things I don't normally get time to > work on. > > -- Steve So, does the kallsyms patch set from Zheng Yejian fix the issues? It was a long time ago that I reviewed his v1. I need to take a look if we decide to go with that approach.
On Thu, 26 Dec 2024 at 18:19, Steven Rostedt <rostedt@goodmis.org> wrote: > > So basically the real solution is to fix kallsyms to know about the end > of functions. Peter Zijlstra mentioned that before, but it would take a > bit more work and understanding of kallsyms to fix it properly. Yeah. The kallsyms code *used* to be pretty simple - really just a list of symbols and addresses. But it took up a lot of memory, so back in the day (two decades ago by now) it started growing some name compression code, and then some serious speedups for lookup. See https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e10392112d315c45f054c22c862e3a7ae27d17d4 for when it went from basically a very simple array of names to be a lot less obvious with that table lookup name compression (but it had _some_ name compression even before that). That said, I think it's really mainly just the name compression that is a bit obscure and looks odd, and it's only used for the builtin kernel symbols (module symbols are still just a per-module array of "const Elf_Sym *"). And you can actually largely ignore the odd name compression - because the *rest* is fairly straightforward. For example, the actual offset of the symbol is simply a plain array still: kallsyms_offsets[]. It's slightly oddly encoded (see kallsyms_sym_address() if you care), but that's because it's an array of 32-bit values used to encode kernel symbol offsets that can obviously be 64-bit. Encoding the size of the symbols should be trivial: just add another array: "kallsyms_sizes[]", and it wouldn't even need that odd encoding. So I actually think it *should* be fairly straightforward to do for anybnody who knows the kallsyms code at all. The main pain-point would be *if* we want to actually expose the sizes in /proc/kallsyms. That would be a file format change. Which we can't do, so we'd have to do it as some kind of strange ioctl setting (or just add a new file under a new name). But maybe we don't even need that. If all the uses are in-kernel, just adding the kallsyms_sizes[] (and accessor helper functions) should be fairly straightforward. Of course, I say that without having done it. I might be overlooking something. Linus
On Thu, 26 Dec 2024 at 18:52, Masahiro Yamada <masahiroy@kernel.org> wrote: > > So, does the kallsyms patch set from Zheng Yejian fix the issues? [ Me goes searching ] Oh. Instead of adding size information, it seems to add fake "hole" entries to the kallsyms. And I guess that works too. It does feel a bit hacky, but at least it's integrated with the kallsyms logic. Linus
On Thu, 26 Dec 2024 at 13:49, Steven Rostedt <rostedt@goodmis.org> wrote: > > But then, when the linker removes these functions because they were > overridden, the code does not disappear, leaving the pointers in the > __mcount_loc locations. Btw, does this actually happen when the compiler does the mcount thing for us? It *feels* like this might be a bug in the FTRACE_MCOUNT_USE_OBJTOOL logic interacting with --gc-setions. Is the problem that --gc-sections removed the function section that contained the weak function that was then never used, but the objtool thing with create_mcount_loc_sections() would generate that mcount_loc_list and nothing realized that it's no longer there? Or does it happen even with the compiler-generated case (ie with the -mrecord-mcount and FTRACE_MCOUNT_USE_CC)? We can disable LD_DEAD_CODE_DATA_ELIMINATION, if that's what triggers it. It's marked as experimental, and it does smell like either --gc-sections is buggy, or we're doing something wrong to trigger it (and I could easily see objtool rewriting object files being that trigger...) Linus
On Thu, 26 Dec 2024 19:35:18 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, 26 Dec 2024 at 13:49, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > But then, when the linker removes these functions because they were > > overridden, the code does not disappear, leaving the pointers in the > > __mcount_loc locations. > > Btw, does this actually happen when the compiler does the mcount thing for us? Yes. I believe the issue is that the mcount_loc is created during the compile phase, and it just points to the call to fentry/mcount. The linker phase doesn't remove the code, just the symbols that are overridden. That means the pointer to the fentry/mcount calls still point to the same locations, as the code is still there. I even sent an email about this to the gcc folks, and Peter responded basically explaining the above. https://lore.kernel.org/all/20241014210841.5a4764de@gandalf.local.home/ -- Steve
On Thu, 26 Dec 2024 at 19:45, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > Btw, does this actually happen when the compiler does the mcount thing for us? > > Yes. Ok, that's actually good. I'm not really worried about the "unused symbols aren't in kallsyms" issue, even if it confuses the mcount logic. THAT confusion is easy to deal with by either adding symbol size information (which I think would be a good thing in general, although perhaps not worth it). Even without the symbol size, the mcount issue can be dealt with by just knowing that the mcount location has to be at the very beginning of the function and just taking the offset - that we already do have - into account. I was more worried that there might also be some much deeper confusion with the linker actually garbage collecting the unused weak function code away, and now an unused symbol that kallsyms doesn't know about wouldn't just have an unexpected mcount pointer to it, but the mcount pointer would actually be stale and point to some unrelated code. So as long as *that* isn't what is happening, this all seems fairly benign. Linus
On Fri, 27 Dec 2024 10:09:40 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, 26 Dec 2024 at 19:45, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > > > Btw, does this actually happen when the compiler does the mcount thing for us? > > > > Yes. > > Ok, that's actually good. > > I'm not really worried about the "unused symbols aren't in kallsyms" > issue, even if it confuses the mcount logic. THAT confusion is easy to > deal with by either adding symbol size information (which I think > would be a good thing in general, although perhaps not worth it). > > Even without the symbol size, the mcount issue can be dealt with by > just knowing that the mcount location has to be at the very beginning > of the function and just taking the offset - that we already do have - > into account. This is actually what we do today. A check is made against the location of the fentry/mcount, and if it's too far away from the function entry, it is considered a weak function. The distance is architecture dependent, as well as options (adding ENDBR and such needs to be accounted for). Now we still need to list them in the available_filter_functions, as the order of what's in mcount_loc is used by the set_filter_function, and if we "hide" any, it can screw up the accounting, because you can enable functions via the index into that file. Using an index is an O(1) operation, where as by name it needs to search all addresses and call kallsyms for a name look up and then compare to what was passed in. Tooling uses the index, as it's much faster to enable several functions (which by name turns into an O(n^2) operation). Enabling a thousand functions by name can take over a minute to complete, whereas by index takes less than a second. For now, we have in available_filter_functions: trace_initcall_finish_cb initcall_blacklisted do_one_initcall __ftrace_invalid_address___708 rootfs_init_fs_context wait_for_initramfs __ftrace_invalid_address___68 calibration_delay_done calibrate_delay Where those "__ftrace_invalid_address_*" is detected as a weak function. This is ugly and a hack, but we still need a place holder for them :-( I would love to find a better way to handle this. Now, the .mcount_loc is sorted at build time. If there's a way to find which addresses are no longer visible, then that code could possibly remove them from the mcount_loc. Perhaps, it could do a 'nm -S' and check to make sure that all addresses are within a listed size, and if not, remove it. That way, we could get rid of those place holders. > > I was more worried that there might also be some much deeper confusion > with the linker actually garbage collecting the unused weak function > code away, and now an unused symbol that kallsyms doesn't know about > wouldn't just have an unexpected mcount pointer to it, but the mcount > pointer would actually be stale and point to some unrelated code. > > So as long as *that* isn't what is happening, this all seems fairly benign. For now, the mcount_loc is just an annoyance as we have those ugly place holders. The real bug is with live kernel patching, where there's a mismatch in the function size. IIUC, the building of the module for the live patching finds the size of the function it's patching with 'nm'. And then before patching, asks kallsyms for the size of the function. Because kallsyms uses the next function to determine the size, it returns the size of the function plus the size of the weak function. There's a mismatch and live kernel patching refuses to patch the function. By adding the actual size of the function to kallsyms, that would fix the problem. The size doesn't need to be exported to /proc/kallsyms, as you mentioned before, all users happen to be in-kernel. The one downsize of storing the numbers, is that we will need to store a size for every function, and that can add up. $ wc -l /proc/kallsyms 208784 /proc/kallsyms That's 208 thousand numbers to store! Perhaps this is the one advantage of having a weak placeholder in kallsyms as well. It may save memory. -- Steve
diff --git a/scripts/Makefile b/scripts/Makefile index 6bcda4b9d054..c1336c0baf6e 100644 --- a/scripts/Makefile +++ b/scripts/Makefile @@ -5,6 +5,7 @@ hostprogs-always-$(CONFIG_KALLSYMS) += kallsyms hostprogs-always-$(BUILD_C_RECORDMCOUNT) += recordmcount +hostprogs-always-y += weakfuncs hostprogs-always-$(CONFIG_BUILDTIME_TABLE_SORT) += sorttable hostprogs-always-$(CONFIG_ASN1) += asn1_compiler hostprogs-always-$(CONFIG_MODULE_SIG_FORMAT) += sign-file diff --git a/scripts/Makefile.build b/scripts/Makefile.build index c16e4cf54d77..0b6c3b9de28b 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -148,6 +148,19 @@ cmd_gen_symversions_c = $(call gen_symversions,c) endif +# Due to recursion, we must skip empty.o. +# The empty.o file is created in the make process in order to determine +# the target endianness and word size. It is made before all other C +# files, including recordmcount. +cmd_weakfuncs = \ + if [ $(@) != "scripts/mod/empty.o" ]; then \ + $(objtree)/scripts/weakfuncs "$(@)"; \ + fi; +weakfuncs_source := $(srctree)/scripts/weakfuncs.c +$(obj)/%.o: $(obj)/%.c $(weakfuncs_source) FORCE + $(call if_changed_rule,cc_o_c) + $(call cmd,force_checksrc) + ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT # compiler will not generate __mcount_loc use recordmcount or recordmcount.pl ifdef BUILD_C_RECORDMCOUNT diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index 7395200538da..c20659251914 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -321,6 +321,7 @@ define rule_cc_o_c $(call cmd,gen_objtooldep) $(call cmd,gen_symversions_c) $(call cmd,record_mcount) + $(call cmd,weakfuncs) $(call cmd,warn_shared_object) endef diff --git a/scripts/weakfuncs.c b/scripts/weakfuncs.c new file mode 100644 index 000000000000..4051720dde49 --- /dev/null +++ b/scripts/weakfuncs.c @@ -0,0 +1,489 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include <stdbool.h> +#include <string.h> +#include <stdlib.h> +#include <stdio.h> +#include <unistd.h> +#include <fcntl.h> +#include <elf.h> + +#include <sys/types.h> +#include <sys/mman.h> +#include <sys/stat.h> + +#define APPEND_STR "__weak__" + +/* Will append APPEND_STR to the start and end */ +#define APPEND_STR_SIZE 16 + +#define ALIGN(x, y) (((x) + (y) - 1) / (y) * (y)) + +typedef union { + Elf32_Ehdr e32; + Elf64_Ehdr e64; +} Elf_Ehdr; + +typedef union { + Elf32_Shdr s32; + Elf64_Shdr s64; +} Elf_Shdr; + +typedef union { + Elf32_Sym s32; + Elf64_Sym s64; +} Elf_Sym; + +struct elf { + void *map; + Elf_Ehdr *ehdr; + Elf_Shdr *shsym; + Elf_Shdr *shstr; + const char *shstrings; + const char *strings; + char *newstrings; + off_t file_size; + off_t newsym; + off_t symend; + off_t newstrstart; + size_t origstrsize; + size_t newstrend; + size_t newstrsize; + int fd; + bool is_32; +}; + +static inline int e_shnum(struct elf *elf) +{ + if (elf->is_32) + return elf->ehdr->e32.e_shnum; + else + return elf->ehdr->e64.e_shnum; +} + +static inline int e_shstrndx(struct elf *elf) +{ + if (elf->is_32) + return elf->ehdr->e32.e_shstrndx; + else + return elf->ehdr->e64.e_shstrndx; +} + +static inline int sh_type(struct elf *elf, Elf_Shdr *shdr) +{ + if (elf->is_32) + return shdr->s32.sh_type; + else + return shdr->s64.sh_type; +} + +static inline size_t sh_offset(struct elf *elf, Elf_Shdr *shdr) +{ + if (elf->is_32) + return shdr->s32.sh_offset; + else + return shdr->s64.sh_offset; +} + +static inline size_t sh_size(struct elf *elf, Elf_Shdr *shdr) +{ + if (elf->is_32) + return shdr->s32.sh_size; + else + return shdr->s64.sh_size; +} + +static inline size_t sh_entsize(struct elf *elf, Elf_Shdr *shdr) +{ + if (elf->is_32) + return shdr->s32.sh_entsize; + else + return shdr->s64.sh_entsize; +} + +static inline const char *sh_name(struct elf *elf, Elf_Shdr *shdr) +{ + if (elf->is_32) + return elf->shstrings + shdr->s32.sh_name; + else + return elf->shstrings + shdr->s64.sh_name; +} + +static inline const char *sym_name(struct elf *elf, Elf_Sym *sym) +{ + if (elf->is_32) + return elf->strings + sym->s32.st_name; + else + return elf->strings + sym->s64.st_name; +} + +static inline int sym_bind(struct elf *elf, Elf_Sym *sym) +{ + if (elf->is_32) + return ELF32_ST_BIND(sym->s32.st_info); + else + return ELF64_ST_BIND(sym->s64.st_info); +} + +static inline size_t sym_value(struct elf *elf, Elf_Sym *sym) +{ + if (elf->is_32) + return sym->s32.st_value; + else + return sym->s64.st_value; +} + +static inline int sym_size(struct elf *elf, Elf_Sym *sym) +{ + if (elf->is_32) + return sym->s32.st_size; + else + return sym->s64.st_size; +} + +static inline int sym_shndx(struct elf *elf, Elf_Sym *sym) +{ + if (elf->is_32) + return sym->s32.st_shndx; + else + return sym->s64.st_shndx; +} + +static inline Elf_Shdr *get_shdr(struct elf *elf, int index) +{ + if (elf->is_32) { + return elf->map + elf->ehdr->e32.e_shoff + + (index * elf->ehdr->e32.e_shentsize); + } else { + return elf->map + elf->ehdr->e64.e_shoff + + (index * elf->ehdr->e64.e_shentsize); + } +} + +static void update_sym_bind(struct elf *elf, Elf_Sym *sym) +{ + int t; + + if (elf->is_32) { + t = ELF32_ST_TYPE(sym->s32.st_info); + sym->s32.st_info = ELF32_ST_INFO(STB_GLOBAL, t); + } else { + t = ELF64_ST_TYPE(sym->s64.st_info); + sym->s64.st_info = ELF64_ST_INFO(STB_GLOBAL, t); + } +} + +static void update_sym_name(struct elf *elf, Elf_Sym *sym, size_t idx) +{ + if (elf->is_32) + sym->s32.st_name = idx; + else + sym->s64.st_name = idx; +} + +static Elf_Shdr *find_sym_table(struct elf *elf) +{ + Elf_Shdr *shdr; + + for (int i = 0; i < e_shnum(elf); i++) { + shdr = get_shdr(elf, i); + if (sh_type(elf, shdr) == SHT_SYMTAB) + return shdr; + } + return NULL; +} + +static const char *find_strings(struct elf *elf) +{ + Elf_Shdr *shdr; + + shdr = get_shdr(elf, e_shstrndx(elf)); + elf->shstrings = elf->map + sh_offset(elf, shdr); + + for (int i = 0; i < e_shnum(elf); i++) { + shdr = get_shdr(elf, i); + if (sh_type(elf, shdr) != SHT_STRTAB) + continue; + + if (!strcmp(sh_name(elf, shdr), ".strtab")) { + elf->shstr = shdr; + elf->origstrsize = sh_size(elf, shdr); + return elf->map + sh_offset(elf, shdr); + } + } + return NULL; +} + +static int add_string(struct elf *elf, const char *str) +{ + size_t size; + int len = strlen(str); + char *newstr; + + size = elf->newstrsize + len + APPEND_STR_SIZE + 1; + newstr = realloc(elf->newstrings, size); + if (!newstr) + return -1; + + elf->newstrings = newstr; + + newstr += elf->newstrsize; + + memcpy(newstr, APPEND_STR, APPEND_STR_SIZE / 2); + newstr += APPEND_STR_SIZE / 2; + + memcpy(newstr, str, len); + newstr += len; + + memcpy(newstr, APPEND_STR, APPEND_STR_SIZE / 2 + 1); + + elf->newstrsize += len + APPEND_STR_SIZE + 1; + + return 0; +} + +static int process_weak_func(struct elf *elf, Elf_Sym *sym) +{ + static off_t symsize; + static void *buf; + const char *str; + Elf_Shdr *shdr; + off_t end; + ssize_t r; + + shdr = get_shdr(elf, sym_shndx(elf, sym)); + /* Make sure the section is executable */ + if (sh_type(elf, shdr) != SHT_PROGBITS) + return 0; + + if (!symsize) { + symsize = sh_entsize(elf, elf->shsym); + buf = calloc(1, symsize); + if (!buf) { + perror("Allocated temp buffer"); + return -1; + } + } + + if (!elf->newsym) { + void *symtab; + size_t size; + + /* copy the symbol table */ + end = elf->file_size; + /* Make aligned by symbol entry size */ + end = ALIGN(end, symsize); + + if (lseek(elf->fd, end, SEEK_SET) != end) { + perror("lseek"); + return -1; + } + + elf->newsym = end; + symtab = elf->map + sh_offset(elf, elf->shsym); + size = sh_size(elf, elf->shsym); + r = write(elf->fd, symtab, size); + if (r != size) { + perror("Copying symtable"); + return -1; + } + + /* Make sure it's still aligned */ + end = lseek(elf->fd, 0, SEEK_END); + if (end < 0) { + perror("Getting new end of file"); + return -1; + } + + elf->symend = end; + end = ALIGN(end, symsize); + if (end != elf->symend) { + /* Add padding */ + r = write(elf->fd, buf, end - elf->symend); + free(buf); + if (r < 0) { + perror("Adding padding"); + return -1; + } + elf->symend = end; + } + elf->newstrend = elf->origstrsize; + } + + memcpy(buf, sym, symsize); + sym = buf; + update_sym_bind(elf, sym); + str = sym_name(elf, sym); + update_sym_name(elf, sym, elf->newstrend); + elf->newstrend += strlen(str) + APPEND_STR_SIZE + 1; + + if (add_string(elf, str) < 0) { + perror("Failed adding new string"); + return -1; + } + + r = write(elf->fd, sym, symsize); + if (r < 0) { + perror("Creating new symbol"); + return -1; + } + elf->symend += symsize; + + return 0; +} + +static int add_new_strings(struct elf *elf) +{ + off_t end; + size_t r; + + end = lseek(elf->fd, 0, SEEK_END); + if (end < 0) { + perror("lseek"); + return -1; + } + elf->newstrstart = end; + + r = write(elf->fd, elf->strings, elf->origstrsize); + if (r != elf->origstrsize) { + perror("Appending copy of string section"); + return -1; + } + + r = write(elf->fd, elf->newstrings, elf->newstrsize); + if (r != elf->newstrsize) { + perror("Appending new strings to section"); + return -1; + } + + end = lseek(elf->fd, 0, SEEK_END); + if (end < 0) { + perror("lseek"); + return -1; + } + elf->newstrend = end; + + return 0; +} + +static void update_shdr(struct elf *elf, Elf_Shdr *shdr, + size_t offset, size_t size) +{ + if (elf->is_32) { + shdr->s32.sh_offset = offset; + shdr->s32.sh_size = size; + } else { + shdr->s64.sh_offset = offset; + shdr->s64.sh_size = size; + } +} + +static void update_elf(struct elf *elf) +{ + update_shdr(elf, elf->shsym, elf->newsym, elf->symend - elf->newsym); + update_shdr(elf, elf->shstr, elf->newstrstart, + elf->newstrend - elf->newstrstart); +} + +static int add_weak_funcs(struct elf *elf) +{ + Elf_Shdr *sh_sym = find_sym_table(elf); + Elf_Sym *sym; + void *end_sym; + + if (!sh_sym) { + fprintf(stderr, "Could not find symbol table\n"); + return -1; + } + + elf->shsym = sh_sym; + elf->strings = find_strings(elf); + if (!elf->strings) { + fprintf(stderr, "Could not find string table\n"); + return -1; + } + + sym = elf->map + sh_offset(elf, sh_sym); + end_sym = elf->map + sh_offset(elf, sh_sym) + sh_size(elf, sh_sym); + + while ((void *)sym < end_sym) { + if (sym_bind(elf, sym) == STB_WEAK) { + if (process_weak_func(elf, sym) < 0) + return -1; + } + sym = (void *)sym + sh_entsize(elf, sh_sym); + } + + if (elf->newstrings) { + if (add_new_strings(elf) < 0) + return -1; + update_elf(elf); + } + + return 0; +} + +static int handle_weak_funcs(const char *file) +{ + struct elf elf = {}; + struct stat st; + void *map; + int ret = -1; + int fd; + + if (stat(file, &st) < 0) { + perror(file); + return -1; + } + + fd = open(file, O_RDWR); + if (fd < 0) { + perror(file); + return -1; + } + + map = mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); + if (map == MAP_FAILED) { + perror("mmap"); + return -1; + } + + elf.map = map; + elf.ehdr = map; + elf.fd = fd; + elf.file_size = st.st_size; + + switch (elf.ehdr->e32.e_machine) { + case EM_386: + elf.is_32 = true; + break; + case EM_X86_64: + elf.is_32 = false; + break; + default: + fprintf(stderr, "ELF type %d is unsupported\n", elf.ehdr->e32.e_machine); + goto out; + } + + ret = add_weak_funcs(&elf); + ret = 0; + out: + munmap(map, st.st_size); + close(fd); + return ret; +} + +int main(int argc, char **argv) +{ + if (argc < 2) { + fprintf(stderr, "usage: weakfunc file.o ...\n"); + exit(-1); + } + + for (int i = 1; i < argc; i++) { + int ret = handle_weak_funcs(argv[i]); + + if (ret < 0) + exit(ret); + } +}