diff mbox series

[RFC,2/3] powerpc/ftrace: Override ftrace_location_lookup() for MPROFILE_KERNEL

Message ID fadc5f2a295d6cb9f590bbbdd71fc2f78bf3a085.1644216043.git.naveen.n.rao@linux.vnet.ibm.com (mailing list archive)
State RFC
Headers show
Series powerpc64/bpf: Add support for BPF Trampolines | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch, async

Commit Message

Naveen N. Rao Feb. 7, 2022, 7:07 a.m. UTC
With CONFIG_MPROFILE_KERNEL, ftrace location is within the first 5
instructions of a function. Override ftrace_location_lookup() to search
within this range for the ftrace location.

Also convert kprobe_lookup_name() to utilize this function.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/kprobes.c      |  8 +-------
 arch/powerpc/kernel/trace/ftrace.c | 11 +++++++++++
 2 files changed, 12 insertions(+), 7 deletions(-)

Comments

Steven Rostedt Feb. 7, 2022, 3:24 p.m. UTC | #1
On Mon,  7 Feb 2022 12:37:21 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -1137,3 +1137,14 @@ char *arch_ftrace_match_adjust(char *str, const char *search)
>  		return str;
>  }
>  #endif /* PPC64_ELF_ABI_v1 */
> +
> +#ifdef CONFIG_MPROFILE_KERNEL
> +unsigned long ftrace_location_lookup(unsigned long ip)
> +{
> +	/*
> +	 * Per livepatch.h, ftrace location is always within the first
> +	 * 16 bytes of a function on powerpc with -mprofile-kernel.
> +	 */
> +	return ftrace_location_range(ip, ip + 16);

I think this is the wrong approach for the implementation and error prone.

> +}
> +#endif
> -- 

What I believe is a safer approach is to use the record address and add the
range to it.

That is, you know that the ftrace modification site is a range (multiple
instructions), where in the ftrace infrastructure, only one ip represents
that range. What you want is if you pass in an ip, and that ip is within
that range, you return the ip that represents that range to ftrace.

It looks like we need to change the compare function in the bsearch.

Perhaps add a new macro to define the size of the range to be searched,
instead of just using MCOUNT_INSN_SIZE? Then we may not even need this new
lookup function?

static int ftrace_cmp_recs(const void *a, const void *b)
{
	const struct dyn_ftrace *key = a;
	const struct dyn_ftrace *rec = b;

	if (key->flags < rec->ip)
		return -1;
	if (key->ip >= rec->ip + ARCH_IP_SIZE)
		return 1;
	return 0;
}

Where ARCH_IP_SIZE is defined to MCOUNT_INSN_SIZE by default, but an arch
could define it to something else, like 16.

Would that work for you, or am I missing something?

-- Steve
Naveen N. Rao Feb. 9, 2022, 5:50 p.m. UTC | #2
Steven Rostedt wrote:
> On Mon,  7 Feb 2022 12:37:21 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> --- a/arch/powerpc/kernel/trace/ftrace.c
>> +++ b/arch/powerpc/kernel/trace/ftrace.c
>> @@ -1137,3 +1137,14 @@ char *arch_ftrace_match_adjust(char *str, const char *search)
>>  		return str;
>>  }
>>  #endif /* PPC64_ELF_ABI_v1 */
>> +
>> +#ifdef CONFIG_MPROFILE_KERNEL
>> +unsigned long ftrace_location_lookup(unsigned long ip)
>> +{
>> +	/*
>> +	 * Per livepatch.h, ftrace location is always within the first
>> +	 * 16 bytes of a function on powerpc with -mprofile-kernel.
>> +	 */
>> +	return ftrace_location_range(ip, ip + 16);
> 
> I think this is the wrong approach for the implementation and error prone.
> 
>> +}
>> +#endif
>> -- 
> 
> What I believe is a safer approach is to use the record address and add the
> range to it.
> 
> That is, you know that the ftrace modification site is a range (multiple
> instructions), where in the ftrace infrastructure, only one ip represents
> that range. What you want is if you pass in an ip, and that ip is within
> that range, you return the ip that represents that range to ftrace.
> 
> It looks like we need to change the compare function in the bsearch.
> 
> Perhaps add a new macro to define the size of the range to be searched,
> instead of just using MCOUNT_INSN_SIZE? Then we may not even need this new
> lookup function?
> 
> static int ftrace_cmp_recs(const void *a, const void *b)
> {
> 	const struct dyn_ftrace *key = a;
> 	const struct dyn_ftrace *rec = b;
> 
> 	if (key->flags < rec->ip)
> 		return -1;
> 	if (key->ip >= rec->ip + ARCH_IP_SIZE)
> 		return 1;
> 	return 0;
> }
> 
> Where ARCH_IP_SIZE is defined to MCOUNT_INSN_SIZE by default, but an arch
> could define it to something else, like 16.
> 
> Would that work for you, or am I missing something?

Yes, I hadn't realized that [un]register_ftrace_direct() and 
modify_ftrace_direct() internally lookup the correct ftrace location, 
and act on that. So, changing ftrace_cmp_recs() does look like it will 
work well for powerpc. Thanks for this suggestion.

However, I think we will not be able to use a fixed range.  I would like 
to reserve instructions from function entry till the branch to 
_mcount(), and it can be two or four instructions depending on whether a 
function has a global entry point. For this, I am considering adding a 
field in 'struct dyn_arch_ftrace', and a hook in ftrace_process_locs() 
to initialize the same. I may need to override ftrace_cmp_recs().


Thanks,
- Naveen
Steven Rostedt Feb. 9, 2022, 9:10 p.m. UTC | #3
On Wed, 09 Feb 2022 17:50:09 +0000
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> However, I think we will not be able to use a fixed range.  I would like 
> to reserve instructions from function entry till the branch to 
> _mcount(), and it can be two or four instructions depending on whether a 
> function has a global entry point. For this, I am considering adding a 
> field in 'struct dyn_arch_ftrace', and a hook in ftrace_process_locs() 
> to initialize the same. I may need to override ftrace_cmp_recs().

Be careful about adding anything to dyn_arch_ftrace. powerpc already adds
the pointer to the module. Anything you add to that gets multiplied by
thousands of times (which takes up memory).

At boot up you may see something like:

  ftrace: allocating 45363 entries in 178 pages

That's 45,363 dyn_arch_ftrace structures. And each module loads their own
as well. To see how many total you have after boot up:


  # cat /sys/kernel/tracing/dyn_ftrace_total_info 
55974 pages:295 groups: 89

That's from the same kernel. Another 10,000 entries were created by modules.
(This was for x86_64)

What you may be able to do, is to add a way to look at the already saved
kallsyms, which keeps track of the function entry and exit to know how to
map an address back to the function.

   kallsyms_lookup(addr, NULL, &offset, NULL, NULL);

Should give you the offset of addr from the start of the function.

-- Steve
Naveen N. Rao Feb. 10, 2022, 1:58 p.m. UTC | #4
Steven Rostedt wrote:
> On Wed, 09 Feb 2022 17:50:09 +0000
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> However, I think we will not be able to use a fixed range.  I would like 
>> to reserve instructions from function entry till the branch to 
>> _mcount(), and it can be two or four instructions depending on whether a 
>> function has a global entry point. For this, I am considering adding a 
>> field in 'struct dyn_arch_ftrace', and a hook in ftrace_process_locs() 
>> to initialize the same. I may need to override ftrace_cmp_recs().
> 
> Be careful about adding anything to dyn_arch_ftrace. powerpc already adds
> the pointer to the module. Anything you add to that gets multiplied by
> thousands of times (which takes up memory).
> 
> At boot up you may see something like:
> 
>   ftrace: allocating 45363 entries in 178 pages
> 
> That's 45,363 dyn_arch_ftrace structures. And each module loads their own
> as well. To see how many total you have after boot up:
> 
> 
>   # cat /sys/kernel/tracing/dyn_ftrace_total_info 
> 55974 pages:295 groups: 89
> 
> That's from the same kernel. Another 10,000 entries were created by modules.
> (This was for x86_64)
> 
> What you may be able to do, is to add a way to look at the already saved
> kallsyms, which keeps track of the function entry and exit to know how to
> map an address back to the function.
> 
>    kallsyms_lookup(addr, NULL, &offset, NULL, NULL);
> 
> Should give you the offset of addr from the start of the function.

Good point. I should be able to overload the existing field for this 
purpose. Is something like the below ok?

---
 arch/powerpc/include/asm/ftrace.h  | 13 ++++++
 arch/powerpc/kernel/trace/ftrace.c | 73 ++++++++++++++++++++++++++----
 kernel/trace/ftrace.c              |  2 +
 3 files changed, 78 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index debe8c4f706260..96d6e26cee86af 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -59,6 +59,19 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
 struct dyn_arch_ftrace {
 	struct module *mod;
 };
+
+struct dyn_ftrace;
+struct module *ftrace_mod_addr_get(struct dyn_ftrace *rec);
+void ftrace_mod_addr_set(struct dyn_ftrace *rec, struct module *mod);
+
+#ifdef CONFIG_MPROFILE_KERNEL
+int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
+#define ftrace_init_nop ftrace_init_nop
+
+int ftrace_cmp_recs(const void *a, const void *b);
+#define ftrace_cmp_recs ftrace_cmp_recs
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 80b6285769f27c..d9b6faa4c98a8c 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -428,21 +428,21 @@ int ftrace_make_nop(struct module *mod,
 	 * We should either already have a pointer to the module
 	 * or it has been passed in.
 	 */
-	if (!rec->arch.mod) {
+	if (!ftrace_mod_addr_get(rec)) {
 		if (!mod) {
 			pr_err("No module loaded addr=%lx\n", addr);
 			return -EFAULT;
 		}
-		rec->arch.mod = mod;
+		ftrace_mod_addr_set(rec, mod);
 	} else if (mod) {
-		if (mod != rec->arch.mod) {
+		if (mod != ftrace_mod_addr_get(rec)) {
 			pr_err("Record mod %p not equal to passed in mod %p\n",
-			       rec->arch.mod, mod);
+			       ftrace_mod_addr_get(rec), mod);
 			return -EINVAL;
 		}
 		/* nothing to do if mod == rec->arch.mod */
 	} else
-		mod = rec->arch.mod;
+		mod = ftrace_mod_addr_get(rec);
 
 	return __ftrace_make_nop(mod, rec, addr);
 #else
@@ -451,6 +451,59 @@ int ftrace_make_nop(struct module *mod,
 #endif /* CONFIG_MODULES */
 }
 
+#ifdef CONFIG_MPROFILE_KERNEL
+struct module *ftrace_mod_addr_get(struct dyn_ftrace *rec)
+{
+	return (struct module *)((unsigned long)rec->arch.mod & ~0x1);
+}
+
+void ftrace_mod_addr_set(struct dyn_ftrace *rec, struct module *mod)
+{
+	rec->arch.mod = (struct module *)(((unsigned long)rec->arch.mod & 0x1) | (unsigned long)mod);
+}
+
+bool ftrace_location_has_gep(const struct dyn_ftrace *rec)
+{
+	return !!((unsigned long)rec->arch.mod & 0x1);
+}
+
+int ftrace_cmp_recs(const void *a, const void *b)
+{
+	const struct dyn_ftrace *key = a;
+	const struct dyn_ftrace *rec = b;
+	int offset = ftrace_location_has_gep(rec) ? 12 : 4;
+
+	if (key->flags < rec->ip - offset)
+		return -1;
+	if (key->ip >= rec->ip + MCOUNT_INSN_SIZE)
+		return 1;
+	return 0;
+}
+
+int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
+{
+	unsigned long offset;
+
+	if (!kallsyms_lookup_size_offset(rec->ip, NULL, &offset) || (offset != 12 && offset != 4)) {
+		/* TODO: implement logic to deduce lep/gep from code */
+	} else if (offset == 12) {
+		ftrace_mod_addr_set(rec, (struct module *)1);
+	}
+
+	return ftrace_make_nop(mod, rec, MCOUNT_ADDR);
+}
+#else
+struct module *ftrace_mod_addr_get(struct dyn_ftrace *rec)
+{
+	return rec->arch.mod;
+}
+
+void ftrace_mod_addr_set(struct dyn_ftrace *rec, struct module * mod)
+{
+	rec->arch.mod = mod;
+}
+#endif /* CONFIG_MPROFILE_KERNEL */
+
 #ifdef CONFIG_MODULES
 #ifdef CONFIG_PPC64
 /*
@@ -494,7 +547,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	ppc_inst_t instr;
 	void *ip = (void *)rec->ip;
 	unsigned long entry, ptr, tramp;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_mod_addr_get(rec);
 
 	/* read where this goes */
 	if (copy_inst_from_kernel_nofault(op, ip))
@@ -561,7 +614,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	int err;
 	ppc_inst_t op;
 	u32 *ip = (u32 *)rec->ip;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_mod_addr_get(rec);
 	unsigned long tramp;
 
 	/* read where this goes */
@@ -678,7 +731,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	 * Being that we are converting from nop, it had better
 	 * already have a module defined.
 	 */
-	if (!rec->arch.mod) {
+	if (!ftrace_mod_addr_get(rec)) {
 		pr_err("No module loaded\n");
 		return -EINVAL;
 	}
@@ -699,7 +752,7 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 	ppc_inst_t op;
 	unsigned long ip = rec->ip;
 	unsigned long entry, ptr, tramp;
-	struct module *mod = rec->arch.mod;
+	struct module *mod = ftrace_mod_addr_get(rec);
 
 	/* If we never set up ftrace trampolines, then bail */
 	if (!mod->arch.tramp || !mod->arch.tramp_regs) {
@@ -814,7 +867,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 	/*
 	 * Out of range jumps are called from modules.
 	 */
-	if (!rec->arch.mod) {
+	if (!ftrace_mod_addr_get(rec)) {
 		pr_err("No module loaded\n");
 		return -EINVAL;
 	}
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index f9feb197b2daaf..68f20cf34b0c47 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1510,6 +1510,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
 	}
 
 
+#ifndef ftrace_cmp_recs
 static int ftrace_cmp_recs(const void *a, const void *b)
 {
 	const struct dyn_ftrace *key = a;
@@ -1521,6 +1522,7 @@ static int ftrace_cmp_recs(const void *a, const void *b)
 		return 1;
 	return 0;
 }
+#endif
 
 static struct dyn_ftrace *lookup_rec(unsigned long start, unsigned long end)
 {



Thanks,
Naveen
Steven Rostedt Feb. 10, 2022, 2:59 p.m. UTC | #5
On Thu, 10 Feb 2022 13:58:29 +0000
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index f9feb197b2daaf..68f20cf34b0c47 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1510,6 +1510,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
>  	}
>  
>  
> +#ifndef ftrace_cmp_recs
>  static int ftrace_cmp_recs(const void *a, const void *b)
>  {
>  	const struct dyn_ftrace *key = a;
> @@ -1521,6 +1522,7 @@ static int ftrace_cmp_recs(const void *a, const void *b)
>  		return 1;
>  	return 0;
>  }
> +#endif
>  

I don't really care for this part, as it seems somewhat ugly. But this
patch does appear to solve your issue, and I can't think of a prettier way
to do this.

So, I will reluctantly ack it.

If anything, please add a comment above saying that architectures may need
to override this function, and when doing so, they will define their own
ftrace_cmp_recs.

-- Steve
Naveen N. Rao Feb. 10, 2022, 4:40 p.m. UTC | #6
Steven Rostedt wrote:
> On Thu, 10 Feb 2022 13:58:29 +0000
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
>> index f9feb197b2daaf..68f20cf34b0c47 100644
>> --- a/kernel/trace/ftrace.c
>> +++ b/kernel/trace/ftrace.c
>> @@ -1510,6 +1510,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
>>  	}
>>  
>>  
>> +#ifndef ftrace_cmp_recs
>>  static int ftrace_cmp_recs(const void *a, const void *b)
>>  {
>>  	const struct dyn_ftrace *key = a;
>> @@ -1521,6 +1522,7 @@ static int ftrace_cmp_recs(const void *a, const void *b)
>>  		return 1;
>>  	return 0;
>>  }
>> +#endif
>>  
> 
> I don't really care for this part, as it seems somewhat ugly. But this
> patch does appear to solve your issue, and I can't think of a prettier way
> to do this.

Yes, this approach looks like it will solve a few different issues for 
us. We would also like to nop-out the instruction before the branch to 
_mcount(), so this helps ensure that kprobes won't also overwrite that 
instruction.

> 
> So, I will reluctantly ack it.

Since we don't want to change struct dyn_ftrace, we will need to contain 
changes within the architecture code, so I don't see a way to do this in 
a generic manner.

The other option is to mark ftrace_cmp_recs() as a __weak function, but 
I have a vague recollection of you suggesting #ifdef rather than a 
__weak function in the past. I might be mis-remembering, so if you think 
making this a __weak function is better, I can do that.

> 
> If anything, please add a comment above saying that architectures may need
> to override this function, and when doing so, they will define their own
> ftrace_cmp_recs.

Sure.


- Naveen
Steven Rostedt Feb. 10, 2022, 5:01 p.m. UTC | #7
On Thu, 10 Feb 2022 16:40:28 +0000
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> The other option is to mark ftrace_cmp_recs() as a __weak function, but 
> I have a vague recollection of you suggesting #ifdef rather than a 
> __weak function in the past. I might be mis-remembering, so if you think 
> making this a __weak function is better, I can do that.

No. If I wanted that I would have suggested it. I think this is the
prettiest of the ugly solutions out there ;-)

As I said, I can't think of a better solution, and we can go with this
until something else comes along.

-- Steve
Naveen N. Rao Feb. 11, 2022, 11:36 a.m. UTC | #8
Steven Rostedt wrote:
> On Thu, 10 Feb 2022 16:40:28 +0000
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> The other option is to mark ftrace_cmp_recs() as a __weak function, but 
>> I have a vague recollection of you suggesting #ifdef rather than a 
>> __weak function in the past. I might be mis-remembering, so if you think 
>> making this a __weak function is better, I can do that.
> 
> No. If I wanted that I would have suggested it. I think this is the
> prettiest of the ugly solutions out there ;-)

Understood :)

> 
> As I said, I can't think of a better solution, and we can go with this
> until something else comes along.

Thanks,
- Naveen
diff mbox series

Patch

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 9a492fdec1dfbe..03cb50e4c8c3e8 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -50,13 +50,7 @@  kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
 	addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
 	if (addr && !offset) {
 #ifdef CONFIG_KPROBES_ON_FTRACE
-		unsigned long faddr;
-		/*
-		 * Per livepatch.h, ftrace location is always within the first
-		 * 16 bytes of a function on powerpc with -mprofile-kernel.
-		 */
-		faddr = ftrace_location_range((unsigned long)addr,
-					      (unsigned long)addr + 16);
+		unsigned long faddr = ftrace_location_lookup((unsigned long)addr);
 		if (faddr)
 			addr = (kprobe_opcode_t *)faddr;
 		else
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index b65ca87a2eacb1..5127eb65c299af 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -1137,3 +1137,14 @@  char *arch_ftrace_match_adjust(char *str, const char *search)
 		return str;
 }
 #endif /* PPC64_ELF_ABI_v1 */
+
+#ifdef CONFIG_MPROFILE_KERNEL
+unsigned long ftrace_location_lookup(unsigned long ip)
+{
+	/*
+	 * Per livepatch.h, ftrace location is always within the first
+	 * 16 bytes of a function on powerpc with -mprofile-kernel.
+	 */
+	return ftrace_location_range(ip, ip + 16);
+}
+#endif