diff mbox series

riscv/kprobe: Optimize the performance of patching instruction slot

Message ID 20220907023327.85630-1-liaochang1@huawei.com (mailing list archive)
State New, archived
Headers show
Series riscv/kprobe: Optimize the performance of patching instruction slot | expand

Commit Message

Liao, Chang Sept. 7, 2022, 2:33 a.m. UTC
Since no race condition occurs on each instruction slot, hence it is
safe to patch instruction slot without stopping machine.

Signed-off-by: Liao Chang <liaochang1@huawei.com>
---
 arch/riscv/kernel/probes/kprobes.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Jisheng Zhang Sept. 7, 2022, 5:21 p.m. UTC | #1
On Wed, Sep 07, 2022 at 10:33:27AM +0800, Liao Chang wrote:
> Since no race condition occurs on each instruction slot, hence it is
> safe to patch instruction slot without stopping machine.

hmm, IMHO there's race when arming kprobe under SMP, so stopping
machine is necessary here. Maybe I misundertand something.

> 
> Signed-off-by: Liao Chang <liaochang1@huawei.com>
> ---
>  arch/riscv/kernel/probes/kprobes.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
> index e6e950b7cf32..eff7d7fab535 100644
> --- a/arch/riscv/kernel/probes/kprobes.c
> +++ b/arch/riscv/kernel/probes/kprobes.c
> @@ -24,12 +24,14 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
>  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
>  {
>  	unsigned long offset = GET_INSN_LENGTH(p->opcode);
> +	const kprobe_opcode_t brk_insn = __BUG_INSN_32;
> +	kprobe_opcode_t slot[MAX_INSN_SIZE];
>  
>  	p->ainsn.api.restore = (unsigned long)p->addr + offset;
>  
> -	patch_text(p->ainsn.api.insn, p->opcode);
> -	patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset),
> -		   __BUG_INSN_32);
> +	memcpy(slot, &p->opcode, offset);
> +	memcpy((void *)((unsigned long)slot + offset), &brk_insn, 4);
> +	patch_text_nosync(p->ainsn.api.insn, slot, offset + 4);
>  }
>  
>  static void __kprobes arch_prepare_simulate(struct kprobe *p)
> -- 
> 2.17.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Masami Hiramatsu (Google) Sept. 7, 2022, 10:28 p.m. UTC | #2
On Thu, 8 Sep 2022 01:21:27 +0800
Jisheng Zhang <jszhang@kernel.org> wrote:

> On Wed, Sep 07, 2022 at 10:33:27AM +0800, Liao Chang wrote:
> > Since no race condition occurs on each instruction slot, hence it is
> > safe to patch instruction slot without stopping machine.
> 
> hmm, IMHO there's race when arming kprobe under SMP, so stopping
> machine is necessary here. Maybe I misundertand something.

Yeah, usually the self modifying code needs stop other CPUs some known
points so that other CPUs does not execute the instruction which will
be modified.
Even if a chip ensures that, is that safe for other implementations?
(Does RISC-V specification guarantee this behavior?)

Thank you,

> 
> > 
> > Signed-off-by: Liao Chang <liaochang1@huawei.com>
> > ---
> >  arch/riscv/kernel/probes/kprobes.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
> > index e6e950b7cf32..eff7d7fab535 100644
> > --- a/arch/riscv/kernel/probes/kprobes.c
> > +++ b/arch/riscv/kernel/probes/kprobes.c
> > @@ -24,12 +24,14 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
> >  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
> >  {
> >  	unsigned long offset = GET_INSN_LENGTH(p->opcode);
> > +	const kprobe_opcode_t brk_insn = __BUG_INSN_32;
> > +	kprobe_opcode_t slot[MAX_INSN_SIZE];
> >  
> >  	p->ainsn.api.restore = (unsigned long)p->addr + offset;
> >  
> > -	patch_text(p->ainsn.api.insn, p->opcode);
> > -	patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset),
> > -		   __BUG_INSN_32);
> > +	memcpy(slot, &p->opcode, offset);
> > +	memcpy((void *)((unsigned long)slot + offset), &brk_insn, 4);
> > +	patch_text_nosync(p->ainsn.api.insn, slot, offset + 4);
> >  }
> >  
> >  static void __kprobes arch_prepare_simulate(struct kprobe *p)
> > -- 
> > 2.17.1
> > 
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
Liao, Chang Sept. 8, 2022, 1:43 a.m. UTC | #3
Thanks for comment.

在 2022/9/8 1:21, Jisheng Zhang 写道:
> On Wed, Sep 07, 2022 at 10:33:27AM +0800, Liao Chang wrote:
>> Since no race condition occurs on each instruction slot, hence it is
>> safe to patch instruction slot without stopping machine.
> 
> hmm, IMHO there's race when arming kprobe under SMP, so stopping
> machine is necessary here. Maybe I misundertand something.
> 

It is indeed necessary to stop machine when arm kprobe under SMP,
but i don't think it need to stop machine when prepare instruction slot,
two reasons:

1. Instruction slot is dynamically allocated data.
2. Kernel would not execute instruction slot until original instruction
   is replaced by breakpoint.

>>
>> Signed-off-by: Liao Chang <liaochang1@huawei.com>
>> ---
>>  arch/riscv/kernel/probes/kprobes.c | 8 +++++---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
>> index e6e950b7cf32..eff7d7fab535 100644
>> --- a/arch/riscv/kernel/probes/kprobes.c
>> +++ b/arch/riscv/kernel/probes/kprobes.c
>> @@ -24,12 +24,14 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
>>  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
>>  {
>>  	unsigned long offset = GET_INSN_LENGTH(p->opcode);
>> +	const kprobe_opcode_t brk_insn = __BUG_INSN_32;
>> +	kprobe_opcode_t slot[MAX_INSN_SIZE];
>>  
>>  	p->ainsn.api.restore = (unsigned long)p->addr + offset;
>>  
>> -	patch_text(p->ainsn.api.insn, p->opcode);
>> -	patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset),
>> -		   __BUG_INSN_32);
>> +	memcpy(slot, &p->opcode, offset);
>> +	memcpy((void *)((unsigned long)slot + offset), &brk_insn, 4);
>> +	patch_text_nosync(p->ainsn.api.insn, slot, offset + 4);
>>  }
>>  
>>  static void __kprobes arch_prepare_simulate(struct kprobe *p)
>> -- 
>> 2.17.1
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> .
Masami Hiramatsu (Google) Sept. 8, 2022, 12:49 p.m. UTC | #4
On Thu, 8 Sep 2022 09:43:45 +0800
"liaochang (A)" <liaochang1@huawei.com> wrote:

> Thanks for comment.
> 
> 在 2022/9/8 1:21, Jisheng Zhang 写道:
> > On Wed, Sep 07, 2022 at 10:33:27AM +0800, Liao Chang wrote:
> >> Since no race condition occurs on each instruction slot, hence it is
> >> safe to patch instruction slot without stopping machine.
> > 
> > hmm, IMHO there's race when arming kprobe under SMP, so stopping
> > machine is necessary here. Maybe I misundertand something.
> > 
> 
> It is indeed necessary to stop machine when arm kprobe under SMP,
> but i don't think it need to stop machine when prepare instruction slot,
> two reasons:
> 
> 1. Instruction slot is dynamically allocated data.
> 2. Kernel would not execute instruction slot until original instruction
>    is replaced by breakpoint.

Ah, this is for ss (single step out of line) slot. So until
kprobe is enabled, this should not be used from other cores.
OK, then it should be safe.


> >>
> >> Signed-off-by: Liao Chang <liaochang1@huawei.com>
> >> ---
> >>  arch/riscv/kernel/probes/kprobes.c | 8 +++++---
> >>  1 file changed, 5 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
> >> index e6e950b7cf32..eff7d7fab535 100644
> >> --- a/arch/riscv/kernel/probes/kprobes.c
> >> +++ b/arch/riscv/kernel/probes/kprobes.c
> >> @@ -24,12 +24,14 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
> >>  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
> >>  {
> >>  	unsigned long offset = GET_INSN_LENGTH(p->opcode);
> >> +	const kprobe_opcode_t brk_insn = __BUG_INSN_32;
> >> +	kprobe_opcode_t slot[MAX_INSN_SIZE];
> >>  
> >>  	p->ainsn.api.restore = (unsigned long)p->addr + offset;
> >>  
> >> -	patch_text(p->ainsn.api.insn, p->opcode);
> >> -	patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset),
> >> -		   __BUG_INSN_32);
> >> +	memcpy(slot, &p->opcode, offset);
> >> +	memcpy((void *)((unsigned long)slot + offset), &brk_insn, 4);
> >> +	patch_text_nosync(p->ainsn.api.insn, slot, offset + 4);

BTW, didn't you have a macro for the size of __BUG_INSN_32?

Thank you,


> >>  }
> >>  
> >>  static void __kprobes arch_prepare_simulate(struct kprobe *p)
> >> -- 
> >> 2.17.1
> >>
> >>
> >> _______________________________________________
> >> linux-riscv mailing list
> >> linux-riscv@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/linux-riscv
> > .
> 
> -- 
> BR,
> Liao, Chang
Liao, Chang Sept. 9, 2022, 1:55 a.m. UTC | #5
在 2022/9/8 20:49, Masami Hiramatsu (Google) 写道:
> On Thu, 8 Sep 2022 09:43:45 +0800
> "liaochang (A)" <liaochang1@huawei.com> wrote:
> 
>> Thanks for comment.
>>
>> 在 2022/9/8 1:21, Jisheng Zhang 写道:
>>> On Wed, Sep 07, 2022 at 10:33:27AM +0800, Liao Chang wrote:
>>>> Since no race condition occurs on each instruction slot, hence it is
>>>> safe to patch instruction slot without stopping machine.
>>>
>>> hmm, IMHO there's race when arming kprobe under SMP, so stopping
>>> machine is necessary here. Maybe I misundertand something.
>>>
>>
>> It is indeed necessary to stop machine when arm kprobe under SMP,
>> but i don't think it need to stop machine when prepare instruction slot,
>> two reasons:
>>
>> 1. Instruction slot is dynamically allocated data.
>> 2. Kernel would not execute instruction slot until original instruction
>>    is replaced by breakpoint.
> 
> Ah, this is for ss (single step out of line) slot. So until
> kprobe is enabled, this should not be used from other cores.
> OK, then it should be safe.

Exactly, Masami, and i find out this optimization could be applied to some other
architectures, such as arm64 and csky, do you think it is good time to do them all.

Thanks.

> 
> 
>>>>
>>>> Signed-off-by: Liao Chang <liaochang1@huawei.com>
>>>> ---
>>>>  arch/riscv/kernel/probes/kprobes.c | 8 +++++---
>>>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
>>>> index e6e950b7cf32..eff7d7fab535 100644
>>>> --- a/arch/riscv/kernel/probes/kprobes.c
>>>> +++ b/arch/riscv/kernel/probes/kprobes.c
>>>> @@ -24,12 +24,14 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
>>>>  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
>>>>  {
>>>>  	unsigned long offset = GET_INSN_LENGTH(p->opcode);
>>>> +	const kprobe_opcode_t brk_insn = __BUG_INSN_32;
>>>> +	kprobe_opcode_t slot[MAX_INSN_SIZE];
>>>>  
>>>>  	p->ainsn.api.restore = (unsigned long)p->addr + offset;
>>>>  
>>>> -	patch_text(p->ainsn.api.insn, p->opcode);
>>>> -	patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset),
>>>> -		   __BUG_INSN_32);
>>>> +	memcpy(slot, &p->opcode, offset);
>>>> +	memcpy((void *)((unsigned long)slot + offset), &brk_insn, 4);
>>>> +	patch_text_nosync(p->ainsn.api.insn, slot, offset + 4);
> 
> BTW, didn't you have a macro for the size of __BUG_INSN_32?
> 
> Thank you,

I think you are saying GET_INSN_LENGTH, i will use it to caculate
the size of __BUG_INSN_32 in v2, instead of magic number '4'.

Thanks.

> 
> 
>>>>  }
>>>>  
>>>>  static void __kprobes arch_prepare_simulate(struct kprobe *p)
>>>> -- 
>>>> 2.17.1
>>>>
>>>>
>>>> _______________________________________________
>>>> linux-riscv mailing list
>>>> linux-riscv@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>> .
>>
>> -- 
>> BR,
>> Liao, Chang
> 
>
Masami Hiramatsu (Google) Sept. 10, 2022, 2:24 a.m. UTC | #6
On Fri, 9 Sep 2022 09:55:08 +0800
"liaochang (A)" <liaochang1@huawei.com> wrote:
> 
> 
> 在 2022/9/8 20:49, Masami Hiramatsu (Google) 写道:
> > On Thu, 8 Sep 2022 09:43:45 +0800
> > "liaochang (A)" <liaochang1@huawei.com> wrote:
> > 
> >> Thanks for comment.
> >>
> >> 在 2022/9/8 1:21, Jisheng Zhang 写道:
> >>> On Wed, Sep 07, 2022 at 10:33:27AM +0800, Liao Chang wrote:
> >>>> Since no race condition occurs on each instruction slot, hence it is
> >>>> safe to patch instruction slot without stopping machine.
> >>>
> >>> hmm, IMHO there's race when arming kprobe under SMP, so stopping
> >>> machine is necessary here. Maybe I misundertand something.
> >>>
> >>
> >> It is indeed necessary to stop machine when arm kprobe under SMP,
> >> but i don't think it need to stop machine when prepare instruction slot,
> >> two reasons:
> >>
> >> 1. Instruction slot is dynamically allocated data.
> >> 2. Kernel would not execute instruction slot until original instruction
> >>    is replaced by breakpoint.
> > 
> > Ah, this is for ss (single step out of line) slot. So until
> > kprobe is enabled, this should not be used from other cores.
> > OK, then it should be safe.
> 
> Exactly, Masami, and i find out this optimization could be applied to some other
> architectures, such as arm64 and csky, do you think it is good time to do them all.

Yes, we should reduce the stop_machine() usage. Thanks for pointing it!

> 
> Thanks.
> 
> > 
> > 
> >>>>
> >>>> Signed-off-by: Liao Chang <liaochang1@huawei.com>
> >>>> ---
> >>>>  arch/riscv/kernel/probes/kprobes.c | 8 +++++---
> >>>>  1 file changed, 5 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
> >>>> index e6e950b7cf32..eff7d7fab535 100644
> >>>> --- a/arch/riscv/kernel/probes/kprobes.c
> >>>> +++ b/arch/riscv/kernel/probes/kprobes.c
> >>>> @@ -24,12 +24,14 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
> >>>>  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
> >>>>  {
> >>>>  	unsigned long offset = GET_INSN_LENGTH(p->opcode);
> >>>> +	const kprobe_opcode_t brk_insn = __BUG_INSN_32;
> >>>> +	kprobe_opcode_t slot[MAX_INSN_SIZE];
> >>>>  
> >>>>  	p->ainsn.api.restore = (unsigned long)p->addr + offset;
> >>>>  
> >>>> -	patch_text(p->ainsn.api.insn, p->opcode);
> >>>> -	patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset),
> >>>> -		   __BUG_INSN_32);
> >>>> +	memcpy(slot, &p->opcode, offset);
> >>>> +	memcpy((void *)((unsigned long)slot + offset), &brk_insn, 4);
> >>>> +	patch_text_nosync(p->ainsn.api.insn, slot, offset + 4);
> > 
> > BTW, didn't you have a macro for the size of __BUG_INSN_32?
> > 
> > Thank you,
> 
> I think you are saying GET_INSN_LENGTH, i will use it to caculate
> the size of __BUG_INSN_32 in v2, instead of magic number '4'.


Yeah, that's better.

Thank you!

> 
> Thanks.
> 
> > 
> > 
> >>>>  }
> >>>>  
> >>>>  static void __kprobes arch_prepare_simulate(struct kprobe *p)
> >>>> -- 
> >>>> 2.17.1
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> linux-riscv mailing list
> >>>> linux-riscv@lists.infradead.org
> >>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> >>> .
> >>
> >> -- 
> >> BR,
> >> Liao, Chang
> > 
> > 
> 
> -- 
> BR,
> Liao, Chang
diff mbox series

Patch

diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
index e6e950b7cf32..eff7d7fab535 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -24,12 +24,14 @@  post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
 static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
 {
 	unsigned long offset = GET_INSN_LENGTH(p->opcode);
+	const kprobe_opcode_t brk_insn = __BUG_INSN_32;
+	kprobe_opcode_t slot[MAX_INSN_SIZE];
 
 	p->ainsn.api.restore = (unsigned long)p->addr + offset;
 
-	patch_text(p->ainsn.api.insn, p->opcode);
-	patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset),
-		   __BUG_INSN_32);
+	memcpy(slot, &p->opcode, offset);
+	memcpy((void *)((unsigned long)slot + offset), &brk_insn, 4);
+	patch_text_nosync(p->ainsn.api.insn, slot, offset + 4);
 }
 
 static void __kprobes arch_prepare_simulate(struct kprobe *p)