mbox series

[00/14] scripts/sorttable: ftrace: Remove place holders for weak functions in available_filter_functions

Message ID 20250102185845.928488650@goodmis.org (mailing list archive)
Headers show
Series scripts/sorttable: ftrace: Remove place holders for weak functions in available_filter_functions | expand

Message

Steven Rostedt Jan. 2, 2025, 6:58 p.m. UTC
This series removes the place holder __ftrace_invalid_address___ from
the available_filter_functions file.

The first 13 patches clean up the scripts/sorttable.c code. It was
a copy from recordmcount.c which is very hard to maintain. That's
because it uses macro helpers and places the code in a header file
sorttable.h to handle both the 64 bit and 32 bit version of  the Elf
structures. It also has _r()/r()/r2() wrappers around accessing the
data which will read the 64 bit or 32 bit version of the data as well
as handle endianess. If the wrong wrapper is used, an invalid value
will result, and this has been a cause for bugs in the past. In fact
the new ORC code doesn't even use it. That's fine because ORC is only
for 64 bit x86 which is the default parsing.

Instead of having a bunch of macros defined and then include the code
twice from a header, the Elf structures are each wrapped in a union.
The union holds the 64 bit and 32 bit version of the needed structure.
To access the values, helper function pointers are used instead of
defining a function. For example, instead of having:

In sorttable.h:

 #undef Elf_Ehdr
 #undef Elf_Shdr

 #ifdef SORTTABLE_64
 # define Elf_Ehdr		Elf64_Ehdr
 # define Elf_Shdr		Elf64_Shdr
 [..]
 # define _r			r8
 #else
 # define Elf_Ehdr		Elf32_Ehdr
 # define Elf_Shdr		Elf32_Shdr
 [..]
 # define _r			r
 #endif

 [..]
 Elf_Shdr *s, *shdr = (Elf_Shdr *)((char *)ehdr + _r(&ehdr->e_shoff));

In sorttable.c:

 #include "sorttable.h"
 #define SORTTABLE_64
 #include "sorttable.h"

Using the Unions we have:

 typedef union {
	Elf32_Ehdr	e32;
	Elf64_Ehdr	e64;
 } Elf_Ehdr;

 typedef union {
	Elf32_Shdr	e32;
	Elf64_Shdr	e64;
 } Elf_Shdr;

 [..]

 static uint64_t ehdr64_shoff(Elf_Ehdr *ehdr)
 {
	return r8(&ehdr->e64.e_shoff);
 }

 static uint64_t ehdr32_shoff(Elf_Ehdr *ehdr)
 {
	return r(&ehdr->e32.e_shoff);
 }

 [..]
 static uint64_t (*ehdr_shoff)(Elf_Ehdr *ehdr);
 [..]

	switch (ehdr->e32.e_ident[EI_CLASS]) {
	case ELFCLASS32:
		[..]
		ehdr_shoff	= ehdr32_shoff;
		[..]
	case ELFCLASS65:
		[..]
		ehdr_shoff	= ehdr64_shoff;
		[..]

 shdr_start = (Elf_Shdr *)((char *)ehdr + ehdr_shoff(ehdr));

The code may be a little more verbose, but the meat of the code is easier to
read, and the conversion functions live in the helper functions to make
it easier to have the fields read the proper way.

This makes the code easier to maintain, and for this purpose easier
to extend. Which is the last patch of the series.

The last patch adds the option "-s <file>" to sorttable.c. Now this code
is called by:

  ${NM} -S vmlinux > .tmp_vmlinux.nm-sort
  ${objtree}/scripts/sorttable -s .tmp_vmlinux.nm-sort ${1}

Where the file created by "nm -S" is read, recording the address
and the associated sizes of each function. It then is sorted, and
before sorting the mcount_loc table, it is scanned to make sure
all symbols in the mcounc_loc are within the boundaries of the functions
defined by nm. If they are not, they are zeroed out, as they are most
likely weak functions (I don't know what else they would be).

Then on boot up, when creating the ftrace tables from the mcount_loc
table, it will ignore any function that matches the kaslr_offset()
value. As KASLR will still shift the values even if they are zero.
But by skipping over entries in mcount_loc that match kaslr_offset()
all weak functions are removed from the ftrace table as well as the
available_filter_functions file that is derived from it.

Before:
    
 ~# grep __ftrace_invalid_address___ /sys/kernel/tracing/available_filter_functions | wc -l
 556

After:

 ~# grep __ftrace_invalid_address___ /sys/kernel/tracing/available_filter_functions | wc -l
 0

Steven Rostedt (14):
      scripts/sorttable: Remove unused macro defines
      scripts/sorttable: Remove unused write functions
      scripts/sorttable: Remove unneeded Elf_Rel
      scripts/sorttable: Have the ORC code use the _r() functions to read
      scripts/sorttable: Make compare_extable() into two functions
      scripts/sorttable: Convert Elf_Ehdr to union
      scripts/sorttable: Replace Elf_Shdr Macro with a union
      scripts/sorttable: Convert Elf_Sym MACRO over to a union
      scripts/sorttable: Add helper functions for Elf_Ehdr
      scripts/sorttable: Add helper functions for Elf_Shdr
      scripts/sorttable: Add helper functions for Elf_Sym
      scripts/sorttable: Use uint64_t for mcount sorting
      scripts/sorttable: Move code from sorttable.h into sorttable.c
      scripts/sorttable: ftrace: Do not add weak functions to available_filter_functions

----
 kernel/trace/ftrace.c   |  14 +
 scripts/link-vmlinux.sh |   4 +-
 scripts/sorttable.c     | 810 ++++++++++++++++++++++++++++++++++++++++++++----
 scripts/sorttable.h     | 497 -----------------------------
 4 files changed, 771 insertions(+), 554 deletions(-)
 delete mode 100644 scripts/sorttable.h

Comments

Steven Rostedt Jan. 2, 2025, 7:24 p.m. UTC | #1
On Thu, 02 Jan 2025 13:58:45 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> The last patch adds the option "-s <file>" to sorttable.c. Now this code
> is called by:
> 
>   ${NM} -S vmlinux > .tmp_vmlinux.nm-sort
>   ${objtree}/scripts/sorttable -s .tmp_vmlinux.nm-sort ${1}
> 
> Where the file created by "nm -S" is read, recording the address
> and the associated sizes of each function. It then is sorted, and
> before sorting the mcount_loc table, it is scanned to make sure
> all symbols in the mcounc_loc are within the boundaries of the functions
> defined by nm. If they are not, they are zeroed out, as they are most
> likely weak functions (I don't know what else they would be).
> 
> Then on boot up, when creating the ftrace tables from the mcount_loc
> table, it will ignore any function that matches the kaslr_offset()
> value. As KASLR will still shift the values even if they are zero.
> But by skipping over entries in mcount_loc that match kaslr_offset()
> all weak functions are removed from the ftrace table as well as the
> available_filter_functions file that is derived from it.
> 

I mentioned this in the last patch, but forgot to mention it here.

Even if kallsyms is "fixed" where it doesn't return the name of a function
if the address is outside its size, that doesn't fix the place holder issue
with available_filter_functions. That would just make it easier to know a
function doesn't have a name, but a place holder is still required.

This patch set removes those place holders regardless of kallsyms being
fixed or not.

-- Steve
Linus Torvalds Jan. 2, 2025, 7:30 p.m. UTC | #2
On Thu, 2 Jan 2025 at 10:59, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Where the file created by "nm -S" is read, recording the address
> and the associated sizes of each function. It then is sorted, and
> before sorting the mcount_loc table, it is scanned to make sure
> all symbols in the mcounc_loc are within the boundaries of the functions
> defined by nm. If they are not, they are zeroed out, as they are most
> likely weak functions (I don't know what else they would be).

Please just do this by sorting non-existent functions at the end,
instead of just zeroing them out.

That makes the mcount_loc table dense in valid entries. We could then
just rewrite the size of the table (or just add a variable containing
the size, if you don't want to change ELF metadata - but you're
already sorting the table, so why not?)

Because:

> Then on boot up, when creating the ftrace tables from the mcount_loc
> table, it will ignore any function that matches the kaslr_offset()
> value.

Why even do that? Why not just make the mcount_loc table be proper in
the first place.

             Linus
Steven Rostedt Jan. 2, 2025, 7:45 p.m. UTC | #3
On Thu, 2 Jan 2025 11:30:12 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Please just do this by sorting non-existent functions at the end,
> instead of just zeroing them out.
> 
> That makes the mcount_loc table dense in valid entries. We could then
> just rewrite the size of the table (or just add a variable containing
> the size, if you don't want to change ELF metadata - but you're
> already sorting the table, so why not?)
> 
> Because:
> 
> > Then on boot up, when creating the ftrace tables from the mcount_loc
> > table, it will ignore any function that matches the kaslr_offset()
> > value.  
> 
> Why even do that? Why not just make the mcount_loc table be proper in
> the first place.

I was a bit nervous about changing the stop_mcount_loc value. I thought of
doing that first, but then I noticed that the value is found by looking at
the System.map file and not from the object itself. Changing it in the
object will require some more elf parsing. Just zeroing out didn't require
that.

I'm fine with adding that, but it will take some more elf foo magic, and my
time to work on this is coming near its end.

To do this, I believe the symbol table will need to be searched for the
__stop_mcount_loc. This could be a clean up as well, as I don't really like
that the code does a search of System.map, and reading it from the object
file may be more robust. Then when we have the values from the object file,
we should also be able to modify it.

-- Steve
Steven Rostedt Jan. 2, 2025, 7:47 p.m. UTC | #4
On Thu, 2 Jan 2025 14:45:53 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> > Why even do that? Why not just make the mcount_loc table be proper in
> > the first place.  
> 
> I was a bit nervous about changing the stop_mcount_loc value. I thought of
> doing that first, but then I noticed that the value is found by looking at
> the System.map file and not from the object itself. Changing it in the
> object will require some more elf parsing. Just zeroing out didn't require
> that.

I could still zero it out, but then change the start_mcount_loc to the
first valid function after the sort. As all the zeros will be at the start.

-- Steve
Steven Rostedt Jan. 2, 2025, 9:44 p.m. UTC | #5
On Thu, 2 Jan 2025 11:30:12 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Please just do this by sorting non-existent functions at the end,
> instead of just zeroing them out.
> 
> That makes the mcount_loc table dense in valid entries. We could then
> just rewrite the size of the table (or just add a variable containing
> the size, if you don't want to change ELF metadata - but you're
> already sorting the table, so why not?)

Well, I tried to move the __start_mcount_loc, but it appears that changing
the symbol value *after* the linking phase does nothing :-p  The references
to it have already been resolved. The Elf_Rel* will do the updates from
then on, and to read those, becomes architecture dependent.

I guess the next thing I could do is to create a "skip" variable that can
be modified, and we can skip X entries in the start_mcount_loc. As the
start_mcount_loc and stop_mcount_loc (which determines the size of the
table) cannot be modified in an architecture independent way.

-- Steve
Steven Rostedt Jan. 2, 2025, 10:14 p.m. UTC | #6
On Thu, 2 Jan 2025 16:44:54 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> I guess the next thing I could do is to create a "skip" variable that can
> be modified, and we can skip X entries in the start_mcount_loc. As the
> start_mcount_loc and stop_mcount_loc (which determines the size of the
> table) cannot be modified in an architecture independent way.

This is on top of the last patch, but I'll likely restructure it and remove
the kaslr_offset() part totally. I'd still break it up into two patches.
One to remove the reading of the System.map, and then the other to add the
skip variable.

This version, I add a "ftrace_mcount_skip" that is default to zero, but the
sorttable.c code will update it to skip over the functions that it zeroed
out.

-- Steve

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 5963ae76b31a..3193148102e0 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7077,20 +7077,6 @@ static int ftrace_process_locs(struct module *mod,
 			continue;
 		}
 
-		/*
-		 * At build time, a check is made against: nm -S vmlinux
-		 * to make sure all functions are found within the
-		 * size range of symbols listed by nm. If not, it's likely
-		 * a weak function that was overridden. We do not want those.
-		 * The script will zero them out, but kaslr will still
-		 * update them. If the address is the same as the kaslr_offset()
-		 * then skip the record.
-		 */
-		if (addr == kaslr_offset()) {
-			skipped++;
-			continue;
-		}
-
 		end_offset = (pg->index+1) * sizeof(pg->records[0]);
 		if (end_offset > PAGE_SIZE << pg->order) {
 			/* We should have allocated enough */
@@ -7761,10 +7747,13 @@ int __init __weak ftrace_dyn_arch_init(void)
 	return 0;
 }
 
+int ftrace_mcount_skip __initdata;
+
 void __init ftrace_init(void)
 {
 	extern unsigned long __start_mcount_loc[];
 	extern unsigned long __stop_mcount_loc[];
+	unsigned long *start_addr = __start_mcount_loc;
 	unsigned long count, flags;
 	int ret;
 
@@ -7775,6 +7764,14 @@ void __init ftrace_init(void)
 		goto failed;
 
 	count = __stop_mcount_loc - __start_mcount_loc;
+
+	/*
+	 * scripts/sorttable.c will set ftrace_mcount_skip to state
+	 * how many functions to skip in the __mcount_loc section.
+	 * These would have been weak functions.
+	 */
+	count -= ftrace_mcount_skip;
+	start_addr += ftrace_mcount_skip;
 	if (!count) {
 		pr_info("ftrace: No functions to be traced?\n");
 		goto failed;
@@ -7783,9 +7780,7 @@ void __init ftrace_init(void)
 	pr_info("ftrace: allocating %ld entries in %ld pages\n",
 		count, DIV_ROUND_UP(count, ENTRIES_PER_PAGE));
 
-	ret = ftrace_process_locs(NULL,
-				  __start_mcount_loc,
-				  __stop_mcount_loc);
+	ret = ftrace_process_locs(NULL, start_addr, __stop_mcount_loc);
 	if (ret) {
 		pr_warn("ftrace: failed to allocate entries for functions\n");
 		goto failed;
diff --git a/scripts/sorttable.c b/scripts/sorttable.c
index d1d52bd12adb..506172898fd8 100644
--- a/scripts/sorttable.c
+++ b/scripts/sorttable.c
@@ -543,6 +543,7 @@ static pthread_t mcount_sort_thread;
 struct elf_mcount_loc {
 	Elf_Ehdr *ehdr;
 	Elf_Shdr *init_data_sec;
+	Elf_Sym *ftrace_skip_sym;
 	uint64_t start_mcount_loc;
 	uint64_t stop_mcount_loc;
 };
@@ -554,12 +555,19 @@ static void *sort_mcount_loc(void *arg)
 	uint64_t offset = emloc->start_mcount_loc - shdr_addr(emloc->init_data_sec)
 					+ shdr_offset(emloc->init_data_sec);
 	uint64_t count = emloc->stop_mcount_loc - emloc->start_mcount_loc;
+	uint32_t *skip_addr = NULL;
 	unsigned char *start_loc = (void *)emloc->ehdr + offset;
+	void *end_loc = start_loc + count;
+
+	if (emloc->ftrace_skip_sym) {
+		offset = sym_value(emloc->ftrace_skip_sym) -
+			shdr_addr(emloc->init_data_sec) +
+			shdr_offset(emloc->init_data_sec);
+		skip_addr = (void *)emloc->ehdr + offset;
+	}
 
 	/* zero out any locations not found by function list */
 	if (function_list_size) {
-		void *end_loc = start_loc + count;
-
 		for (void *ptr = start_loc; ptr < end_loc; ptr += long_size) {
 			uint64_t key;
 
@@ -574,44 +582,65 @@ static void *sort_mcount_loc(void *arg)
 	}
 
 	qsort(start_loc, count/long_size, long_size, compare_extable);
+
+	/* Now set how many functions need to be skipped */
+	for (void *ptr = start_loc; skip_addr && ptr < end_loc; ptr += long_size) {
+		uint64_t val;
+
+		if (long_size == 4)
+			val = *(uint32_t *)ptr;
+		else
+			val = *(uint64_t *)ptr;
+		if (val) {
+			uint32_t skip;
+
+			/* Update the ftrace_mcount_skip to skip these functions */
+			val = ptr - (void *)start_loc;
+			skip = val / long_size;
+			w(r(&skip), skip_addr);
+			break;
+		}
+	}
 	return NULL;
 }
 
 /* Get the address of __start_mcount_loc and __stop_mcount_loc in System.map */
-static void get_mcount_loc(uint64_t *_start, uint64_t *_stop)
+static void get_mcount_loc(struct elf_mcount_loc *emloc, Elf_Shdr *symtab_sec,
+			   const char *strtab)
 {
-	FILE *file_start, *file_stop;
-	char start_buff[20];
-	char stop_buff[20];
-	int len = 0;
+	Elf_Sym *sym, *end_sym;
+	int symentsize = shdr_entsize(symtab_sec);
+	int found = 0;
+
+	sym = (void *)emloc->ehdr + shdr_offset(symtab_sec);
+	end_sym = (void *)sym + shdr_size(symtab_sec);
+
+	while (sym < end_sym) {
+		if (!strcmp(strtab + sym_name(sym), "__start_mcount_loc")) {
+			emloc->start_mcount_loc = sym_value(sym);
+			if (++found == 3)
+				break;
+		} else if (!strcmp(strtab + sym_name(sym), "__stop_mcount_loc")) {
+			emloc->stop_mcount_loc = sym_value(sym);
+			if (++found == 3)
+				break;
+		} else if (!strcmp(strtab + sym_name(sym), "ftrace_mcount_skip")) {
+			emloc->ftrace_skip_sym = sym;
+			if (++found == 3)
+				break;
+		}
+		sym = (void *)sym + symentsize;
+	}
 
-	file_start = popen(" grep start_mcount System.map | awk '{print $1}' ", "r");
-	if (!file_start) {
+	if (!emloc->start_mcount_loc) {
 		fprintf(stderr, "get start_mcount_loc error!");
 		return;
 	}
 
-	file_stop = popen(" grep stop_mcount System.map | awk '{print $1}' ", "r");
-	if (!file_stop) {
+	if (!emloc->stop_mcount_loc) {
 		fprintf(stderr, "get stop_mcount_loc error!");
-		pclose(file_start);
 		return;
 	}
-
-	while (fgets(start_buff, sizeof(start_buff), file_start) != NULL) {
-		len = strlen(start_buff);
-		start_buff[len - 1] = '\0';
-	}
-	*_start = strtoul(start_buff, NULL, 16);
-
-	while (fgets(stop_buff, sizeof(stop_buff), file_stop) != NULL) {
-		len = strlen(stop_buff);
-		stop_buff[len - 1] = '\0';
-	}
-	*_stop = strtoul(stop_buff, NULL, 16);
-
-	pclose(file_start);
-	pclose(file_stop);
 }
 #else /* MCOUNT_SORT_ENABLED */
 static inline int parse_symbols(const char *fname) { return 0; }
@@ -647,8 +676,6 @@ static int do_sort(Elf_Ehdr *ehdr,
 	unsigned int shstrndx;
 #ifdef MCOUNT_SORT_ENABLED
 	struct elf_mcount_loc mstruct = {0};
-	uint64_t _start_mcount_loc = 0;
-	uint64_t _stop_mcount_loc = 0;
 #endif
 #if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
 	unsigned int orc_ip_size = 0;
@@ -686,15 +713,9 @@ static int do_sort(Elf_Ehdr *ehdr,
 
 #ifdef MCOUNT_SORT_ENABLED
 		/* locate the .init.data section in vmlinux */
-		if (!strcmp(secstrings + idx, ".init.data")) {
-			get_mcount_loc(&_start_mcount_loc, &_stop_mcount_loc);
-			mstruct.ehdr = ehdr;
+		if (!strcmp(secstrings + idx, ".init.data"))
 			mstruct.init_data_sec = shdr;
-			mstruct.start_mcount_loc = _start_mcount_loc;
-			mstruct.stop_mcount_loc = _stop_mcount_loc;
-		}
 #endif
-
 #if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
 		/* locate the ORC unwind tables */
 		if (!strcmp(secstrings + idx, ".orc_unwind_ip")) {
@@ -736,23 +757,6 @@ static int do_sort(Elf_Ehdr *ehdr,
 		goto out;
 	}
 #endif
-
-#ifdef MCOUNT_SORT_ENABLED
-	if (!mstruct.init_data_sec || !_start_mcount_loc || !_stop_mcount_loc) {
-		fprintf(stderr,
-			"incomplete mcount's sort in file: %s\n",
-			fname);
-		goto out;
-	}
-
-	/* create thread to sort mcount_loc concurrently */
-	if (pthread_create(&mcount_sort_thread, NULL, &sort_mcount_loc, &mstruct)) {
-		fprintf(stderr,
-			"pthread_create mcount_sort_thread failed '%s': %s\n",
-			strerror(errno), fname);
-		goto out;
-	}
-#endif
 	if (!extab_sec) {
 		fprintf(stderr,	"no __ex_table in file: %s\n", fname);
 		goto out;
@@ -772,6 +776,26 @@ static int do_sort(Elf_Ehdr *ehdr,
 	strtab = (const char *)ehdr + shdr_offset(strtab_sec);
 	symtab = (const Elf_Sym *)((const char *)ehdr + shdr_offset(symtab_sec));
 
+#ifdef MCOUNT_SORT_ENABLED
+	mstruct.ehdr = ehdr;
+	get_mcount_loc(&mstruct, symtab_sec, strtab);
+
+	if (!mstruct.init_data_sec || !mstruct.start_mcount_loc || !mstruct.stop_mcount_loc) {
+		fprintf(stderr,
+			"incomplete mcount's sort in file: %s\n",
+			fname);
+		goto out;
+	}
+
+	/* create thread to sort mcount_loc concurrently */
+	if (pthread_create(&mcount_sort_thread, NULL, &sort_mcount_loc, &mstruct)) {
+		fprintf(stderr,
+			"pthread_create mcount_sort_thread failed '%s': %s\n",
+			strerror(errno), fname);
+		goto out;
+	}
+#endif
+
 	if (custom_sort) {
 		custom_sort(extab_image, shdr_size(extab_sec));
 	} else {