diff mbox series

[3/3] arm64/kprobe: Optimize the performance of patching single-step slot

Message ID 20220923084658.99304-4-liaochang1@huawei.com (mailing list archive)
State New, archived
Headers show
Series kprobe: Optimize the performance of patching ss | expand

Commit Message

Liao, Chang Sept. 23, 2022, 8:46 a.m. UTC
Single-step slot would not be used until kprobe is enabled, that means
no race condition occurs on it under SMP, hence it is safe to pacth ss
slot without stopping machine.

Since I and D caches are coherent within single-step slot from
aarch64_insn_patch_text_nosync(), hence no need to do it again via
flush_icache_range().

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Liao Chang <liaochang1@huawei.com>
---
 arch/arm64/kernel/probes/kprobes.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Comments

Will Deacon Sept. 23, 2022, 12:35 p.m. UTC | #1
On Fri, Sep 23, 2022 at 04:46:58PM +0800, Liao Chang wrote:
> Single-step slot would not be used until kprobe is enabled, that means
> no race condition occurs on it under SMP, hence it is safe to pacth ss
> slot without stopping machine.
> 
> Since I and D caches are coherent within single-step slot from
> aarch64_insn_patch_text_nosync(), hence no need to do it again via
> flush_icache_range().
> 
> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Signed-off-by: Liao Chang <liaochang1@huawei.com>
> ---
>  arch/arm64/kernel/probes/kprobes.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
> index d1d182320245..29b98bc12833 100644
> --- a/arch/arm64/kernel/probes/kprobes.c
> +++ b/arch/arm64/kernel/probes/kprobes.c
> @@ -44,13 +44,10 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
>  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
>  {
>  	kprobe_opcode_t *addr = p->ainsn.api.insn;
> -	void *addrs[] = {addr, addr + 1};
> -	u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS};
>  
>  	/* prepare insn slot */
> -	aarch64_insn_patch_text(addrs, insns, 2);
> -
> -	flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE));
> +	aarch64_insn_patch_text_nosync(addr, p->opcode);
> +	aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS);
>  
>  	/*
>  	 * Needs restoring of return address after stepping xol.
> -- 
> 2.17.1

Acked-by: Will Deacon <will@kernel.org>

Will
Mark Rutland Sept. 23, 2022, 12:39 p.m. UTC | #2
On Fri, Sep 23, 2022 at 04:46:58PM +0800, Liao Chang wrote:
> Single-step slot would not be used until kprobe is enabled, that means
> no race condition occurs on it under SMP, hence it is safe to pacth ss
> slot without stopping machine.

I think this is correct, but this depends on a couple of subtleties,
importantly:

* That the I-cache maintenance for these instructions is complete *before* the
  kprobe BRK is written (and aarch64_insn_patch_text_nosync() ensures this, but
  just omits causing a Context-Synchronization-Event on all CPUS).

* That the kprobe BRK results in an exception (and consequently a
  Context-Synchronoization-Event), which ensures that the CPU will fetch the
  single-step slot instructions *after* this, ensuring that the new
  instructions are used.

It would be good if we could call that out explicitly.

Thanks,
Mark.

> Since I and D caches are coherent within single-step slot from
> aarch64_insn_patch_text_nosync(), hence no need to do it again via
> flush_icache_range().
> 
> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Signed-off-by: Liao Chang <liaochang1@huawei.com>
> ---
>  arch/arm64/kernel/probes/kprobes.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
> index d1d182320245..29b98bc12833 100644
> --- a/arch/arm64/kernel/probes/kprobes.c
> +++ b/arch/arm64/kernel/probes/kprobes.c
> @@ -44,13 +44,10 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
>  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
>  {
>  	kprobe_opcode_t *addr = p->ainsn.api.insn;
> -	void *addrs[] = {addr, addr + 1};
> -	u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS};
>  
>  	/* prepare insn slot */
> -	aarch64_insn_patch_text(addrs, insns, 2);
> -
> -	flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE));
> +	aarch64_insn_patch_text_nosync(addr, p->opcode);
> +	aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS);
>  
>  	/*
>  	 * Needs restoring of return address after stepping xol.
> -- 
> 2.17.1
>
Liao, Chang Sept. 24, 2022, 1:52 a.m. UTC | #3
在 2022/9/23 20:39, Mark Rutland 写道:
> On Fri, Sep 23, 2022 at 04:46:58PM +0800, Liao Chang wrote:
>> Single-step slot would not be used until kprobe is enabled, that means
>> no race condition occurs on it under SMP, hence it is safe to pacth ss
>> slot without stopping machine.
> 
> I think this is correct, but this depends on a couple of subtleties,
> importantly:
> 
> * That the I-cache maintenance for these instructions is complete *before* the
>   kprobe BRK is written (and aarch64_insn_patch_text_nosync() ensures this, but
>   just omits causing a Context-Synchronization-Event on all CPUS).

So in order to guarantee the I-cache maintenance is observed on all CPUS,
it needs to be followed by a explicit Context-Synchronization-Event, perhaps
it is better to place ISB before kprobe BRK is written.

> 
> * That the kprobe BRK results in an exception (and consequently a
>   Context-Synchronoization-Event), which ensures that the CPU will fetch the
>   single-step slot instructions *after* this, ensuring that the new
>   instructions are used.

Yes, because of single-step slot is installed int the BRK execption handler,
so it is not necessary to generate Context-Synchronization-Event via ISB mentioned above...

Thanks.

> 
> It would be good if we could call that out explicitly.
> 
> Thanks,
> Mark.
> 
>> Since I and D caches are coherent within single-step slot from
>> aarch64_insn_patch_text_nosync(), hence no need to do it again via
>> flush_icache_range().
>>
>> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>> Signed-off-by: Liao Chang <liaochang1@huawei.com>
>> ---
>>  arch/arm64/kernel/probes/kprobes.c | 7 ++-----
>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
>> index d1d182320245..29b98bc12833 100644
>> --- a/arch/arm64/kernel/probes/kprobes.c
>> +++ b/arch/arm64/kernel/probes/kprobes.c
>> @@ -44,13 +44,10 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
>>  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
>>  {
>>  	kprobe_opcode_t *addr = p->ainsn.api.insn;
>> -	void *addrs[] = {addr, addr + 1};
>> -	u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS};
>>  
>>  	/* prepare insn slot */
>> -	aarch64_insn_patch_text(addrs, insns, 2);
>> -
>> -	flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE));
>> +	aarch64_insn_patch_text_nosync(addr, p->opcode);
>> +	aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS);
>>  
>>  	/*
>>  	 * Needs restoring of return address after stepping xol.
>> -- 
>> 2.17.1
>>
> 
> .
Masami Hiramatsu (Google) Sept. 25, 2022, 1:21 a.m. UTC | #4
On Sat, 24 Sep 2022 09:52:28 +0800
"liaochang (A)" <liaochang1@huawei.com> wrote:

> 
> 
> 在 2022/9/23 20:39, Mark Rutland 写道:
> > On Fri, Sep 23, 2022 at 04:46:58PM +0800, Liao Chang wrote:
> >> Single-step slot would not be used until kprobe is enabled, that means
> >> no race condition occurs on it under SMP, hence it is safe to pacth ss
> >> slot without stopping machine.
> > 
> > I think this is correct, but this depends on a couple of subtleties,
> > importantly:
> > 
> > * That the I-cache maintenance for these instructions is complete *before* the
> >   kprobe BRK is written (and aarch64_insn_patch_text_nosync() ensures this, but
> >   just omits causing a Context-Synchronization-Event on all CPUS).
> 
> So in order to guarantee the I-cache maintenance is observed on all CPUS,
> it needs to be followed by a explicit Context-Synchronization-Event, perhaps
> it is better to place ISB before kprobe BRK is written.
> 
> > 
> > * That the kprobe BRK results in an exception (and consequently a
> >   Context-Synchronoization-Event), which ensures that the CPU will fetch the
> >   single-step slot instructions *after* this, ensuring that the new
> >   instructions are used.
> 
> Yes, because of single-step slot is installed int the BRK execption handler,
> so it is not necessary to generate Context-Synchronization-Event via ISB mentioned above...

Can you update the patch including above as comments in the code?
Maybe you also have to ensure it on other patches too.

Thank you,

> 
> Thanks.
> 
> > 
> > It would be good if we could call that out explicitly.
> > 
> > Thanks,
> > Mark.
> > 
> >> Since I and D caches are coherent within single-step slot from
> >> aarch64_insn_patch_text_nosync(), hence no need to do it again via
> >> flush_icache_range().
> >>
> >> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> >> Signed-off-by: Liao Chang <liaochang1@huawei.com>
> >> ---
> >>  arch/arm64/kernel/probes/kprobes.c | 7 ++-----
> >>  1 file changed, 2 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
> >> index d1d182320245..29b98bc12833 100644
> >> --- a/arch/arm64/kernel/probes/kprobes.c
> >> +++ b/arch/arm64/kernel/probes/kprobes.c
> >> @@ -44,13 +44,10 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
> >>  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
> >>  {
> >>  	kprobe_opcode_t *addr = p->ainsn.api.insn;
> >> -	void *addrs[] = {addr, addr + 1};
> >> -	u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS};
> >>  
> >>  	/* prepare insn slot */
> >> -	aarch64_insn_patch_text(addrs, insns, 2);
> >> -
> >> -	flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE));
> >> +	aarch64_insn_patch_text_nosync(addr, p->opcode);
> >> +	aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS);
> >>  
> >>  	/*
> >>  	 * Needs restoring of return address after stepping xol.
> >> -- 
> >> 2.17.1
> >>
> > 
> > .
> 
> -- 
> BR,
> Liao, Chang
Liao, Chang Sept. 27, 2022, 1:37 a.m. UTC | #5
在 2022/9/25 9:21, Masami Hiramatsu (Google) 写道:
> On Sat, 24 Sep 2022 09:52:28 +0800
> "liaochang (A)" <liaochang1@huawei.com> wrote:
> 
>>
>>
>> 在 2022/9/23 20:39, Mark Rutland 写道:
>>> On Fri, Sep 23, 2022 at 04:46:58PM +0800, Liao Chang wrote:
>>>> Single-step slot would not be used until kprobe is enabled, that means
>>>> no race condition occurs on it under SMP, hence it is safe to pacth ss
>>>> slot without stopping machine.
>>>
>>> I think this is correct, but this depends on a couple of subtleties,
>>> importantly:
>>>
>>> * That the I-cache maintenance for these instructions is complete *before* the
>>>   kprobe BRK is written (and aarch64_insn_patch_text_nosync() ensures this, but
>>>   just omits causing a Context-Synchronization-Event on all CPUS).
>>
>> So in order to guarantee the I-cache maintenance is observed on all CPUS,
>> it needs to be followed by a explicit Context-Synchronization-Event, perhaps
>> it is better to place ISB before kprobe BRK is written.
>>
>>>
>>> * That the kprobe BRK results in an exception (and consequently a
>>>   Context-Synchronoization-Event), which ensures that the CPU will fetch the
>>>   single-step slot instructions *after* this, ensuring that the new
>>>   instructions are used.
>>
>> Yes, because of single-step slot is installed int the BRK execption handler,
>> so it is not necessary to generate Context-Synchronization-Event via ISB mentioned above...
> 
> Can you update the patch including above as comments in the code?
> Maybe you also have to ensure it on other patches too.

OK,i will add these comments in the code.

Thanks.

> 
> Thank you,
> 
>>
>> Thanks.
>>
>>>
>>> It would be good if we could call that out explicitly.
>>>
>>> Thanks,
>>> Mark.
>>>
>>>> Since I and D caches are coherent within single-step slot from
>>>> aarch64_insn_patch_text_nosync(), hence no need to do it again via
>>>> flush_icache_range().
>>>>
>>>> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>>>> Signed-off-by: Liao Chang <liaochang1@huawei.com>
>>>> ---
>>>>  arch/arm64/kernel/probes/kprobes.c | 7 ++-----
>>>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
>>>> index d1d182320245..29b98bc12833 100644
>>>> --- a/arch/arm64/kernel/probes/kprobes.c
>>>> +++ b/arch/arm64/kernel/probes/kprobes.c
>>>> @@ -44,13 +44,10 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
>>>>  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
>>>>  {
>>>>  	kprobe_opcode_t *addr = p->ainsn.api.insn;
>>>> -	void *addrs[] = {addr, addr + 1};
>>>> -	u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS};
>>>>  
>>>>  	/* prepare insn slot */
>>>> -	aarch64_insn_patch_text(addrs, insns, 2);
>>>> -
>>>> -	flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE));
>>>> +	aarch64_insn_patch_text_nosync(addr, p->opcode);
>>>> +	aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS);
>>>>  
>>>>  	/*
>>>>  	 * Needs restoring of return address after stepping xol.
>>>> -- 
>>>> 2.17.1
>>>>
>>>
>>> .
>>
>> -- 
>> BR,
>> Liao, Chang
> 
>
diff mbox series

Patch

diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index d1d182320245..29b98bc12833 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -44,13 +44,10 @@  post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *);
 static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
 {
 	kprobe_opcode_t *addr = p->ainsn.api.insn;
-	void *addrs[] = {addr, addr + 1};
-	u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS};
 
 	/* prepare insn slot */
-	aarch64_insn_patch_text(addrs, insns, 2);
-
-	flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE));
+	aarch64_insn_patch_text_nosync(addr, p->opcode);
+	aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS);
 
 	/*
 	 * Needs restoring of return address after stepping xol.