Message ID | 20220923084658.99304-4-liaochang1@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kprobe: Optimize the performance of patching ss | expand |
On Fri, Sep 23, 2022 at 04:46:58PM +0800, Liao Chang wrote: > Single-step slot would not be used until kprobe is enabled, that means > no race condition occurs on it under SMP, hence it is safe to pacth ss > slot without stopping machine. > > Since I and D caches are coherent within single-step slot from > aarch64_insn_patch_text_nosync(), hence no need to do it again via > flush_icache_range(). > > Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> > Signed-off-by: Liao Chang <liaochang1@huawei.com> > --- > arch/arm64/kernel/probes/kprobes.c | 7 ++----- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c > index d1d182320245..29b98bc12833 100644 > --- a/arch/arm64/kernel/probes/kprobes.c > +++ b/arch/arm64/kernel/probes/kprobes.c > @@ -44,13 +44,10 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *); > static void __kprobes arch_prepare_ss_slot(struct kprobe *p) > { > kprobe_opcode_t *addr = p->ainsn.api.insn; > - void *addrs[] = {addr, addr + 1}; > - u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS}; > > /* prepare insn slot */ > - aarch64_insn_patch_text(addrs, insns, 2); > - > - flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE)); > + aarch64_insn_patch_text_nosync(addr, p->opcode); > + aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS); > > /* > * Needs restoring of return address after stepping xol. > -- > 2.17.1 Acked-by: Will Deacon <will@kernel.org> Will
On Fri, Sep 23, 2022 at 04:46:58PM +0800, Liao Chang wrote: > Single-step slot would not be used until kprobe is enabled, that means > no race condition occurs on it under SMP, hence it is safe to pacth ss > slot without stopping machine. I think this is correct, but this depends on a couple of subtleties, importantly: * That the I-cache maintenance for these instructions is complete *before* the kprobe BRK is written (and aarch64_insn_patch_text_nosync() ensures this, but just omits causing a Context-Synchronization-Event on all CPUS). * That the kprobe BRK results in an exception (and consequently a Context-Synchronoization-Event), which ensures that the CPU will fetch the single-step slot instructions *after* this, ensuring that the new instructions are used. It would be good if we could call that out explicitly. Thanks, Mark. > Since I and D caches are coherent within single-step slot from > aarch64_insn_patch_text_nosync(), hence no need to do it again via > flush_icache_range(). > > Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> > Signed-off-by: Liao Chang <liaochang1@huawei.com> > --- > arch/arm64/kernel/probes/kprobes.c | 7 ++----- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c > index d1d182320245..29b98bc12833 100644 > --- a/arch/arm64/kernel/probes/kprobes.c > +++ b/arch/arm64/kernel/probes/kprobes.c > @@ -44,13 +44,10 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *); > static void __kprobes arch_prepare_ss_slot(struct kprobe *p) > { > kprobe_opcode_t *addr = p->ainsn.api.insn; > - void *addrs[] = {addr, addr + 1}; > - u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS}; > > /* prepare insn slot */ > - aarch64_insn_patch_text(addrs, insns, 2); > - > - flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE)); > + aarch64_insn_patch_text_nosync(addr, p->opcode); > + aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS); > > /* > * Needs restoring of return address after stepping xol. > -- > 2.17.1 >
在 2022/9/23 20:39, Mark Rutland 写道: > On Fri, Sep 23, 2022 at 04:46:58PM +0800, Liao Chang wrote: >> Single-step slot would not be used until kprobe is enabled, that means >> no race condition occurs on it under SMP, hence it is safe to pacth ss >> slot without stopping machine. > > I think this is correct, but this depends on a couple of subtleties, > importantly: > > * That the I-cache maintenance for these instructions is complete *before* the > kprobe BRK is written (and aarch64_insn_patch_text_nosync() ensures this, but > just omits causing a Context-Synchronization-Event on all CPUS). So in order to guarantee the I-cache maintenance is observed on all CPUS, it needs to be followed by a explicit Context-Synchronization-Event, perhaps it is better to place ISB before kprobe BRK is written. > > * That the kprobe BRK results in an exception (and consequently a > Context-Synchronoization-Event), which ensures that the CPU will fetch the > single-step slot instructions *after* this, ensuring that the new > instructions are used. Yes, because of single-step slot is installed int the BRK execption handler, so it is not necessary to generate Context-Synchronization-Event via ISB mentioned above... Thanks. > > It would be good if we could call that out explicitly. > > Thanks, > Mark. > >> Since I and D caches are coherent within single-step slot from >> aarch64_insn_patch_text_nosync(), hence no need to do it again via >> flush_icache_range(). >> >> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> >> Signed-off-by: Liao Chang <liaochang1@huawei.com> >> --- >> arch/arm64/kernel/probes/kprobes.c | 7 ++----- >> 1 file changed, 2 insertions(+), 5 deletions(-) >> >> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c >> index d1d182320245..29b98bc12833 100644 >> --- a/arch/arm64/kernel/probes/kprobes.c >> +++ b/arch/arm64/kernel/probes/kprobes.c >> @@ -44,13 +44,10 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *); >> static void __kprobes arch_prepare_ss_slot(struct kprobe *p) >> { >> kprobe_opcode_t *addr = p->ainsn.api.insn; >> - void *addrs[] = {addr, addr + 1}; >> - u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS}; >> >> /* prepare insn slot */ >> - aarch64_insn_patch_text(addrs, insns, 2); >> - >> - flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE)); >> + aarch64_insn_patch_text_nosync(addr, p->opcode); >> + aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS); >> >> /* >> * Needs restoring of return address after stepping xol. >> -- >> 2.17.1 >> > > .
On Sat, 24 Sep 2022 09:52:28 +0800 "liaochang (A)" <liaochang1@huawei.com> wrote: > > > 在 2022/9/23 20:39, Mark Rutland 写道: > > On Fri, Sep 23, 2022 at 04:46:58PM +0800, Liao Chang wrote: > >> Single-step slot would not be used until kprobe is enabled, that means > >> no race condition occurs on it under SMP, hence it is safe to pacth ss > >> slot without stopping machine. > > > > I think this is correct, but this depends on a couple of subtleties, > > importantly: > > > > * That the I-cache maintenance for these instructions is complete *before* the > > kprobe BRK is written (and aarch64_insn_patch_text_nosync() ensures this, but > > just omits causing a Context-Synchronization-Event on all CPUS). > > So in order to guarantee the I-cache maintenance is observed on all CPUS, > it needs to be followed by a explicit Context-Synchronization-Event, perhaps > it is better to place ISB before kprobe BRK is written. > > > > > * That the kprobe BRK results in an exception (and consequently a > > Context-Synchronoization-Event), which ensures that the CPU will fetch the > > single-step slot instructions *after* this, ensuring that the new > > instructions are used. > > Yes, because of single-step slot is installed int the BRK execption handler, > so it is not necessary to generate Context-Synchronization-Event via ISB mentioned above... Can you update the patch including above as comments in the code? Maybe you also have to ensure it on other patches too. Thank you, > > Thanks. > > > > > It would be good if we could call that out explicitly. > > > > Thanks, > > Mark. > > > >> Since I and D caches are coherent within single-step slot from > >> aarch64_insn_patch_text_nosync(), hence no need to do it again via > >> flush_icache_range(). > >> > >> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> > >> Signed-off-by: Liao Chang <liaochang1@huawei.com> > >> --- > >> arch/arm64/kernel/probes/kprobes.c | 7 ++----- > >> 1 file changed, 2 insertions(+), 5 deletions(-) > >> > >> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c > >> index d1d182320245..29b98bc12833 100644 > >> --- a/arch/arm64/kernel/probes/kprobes.c > >> +++ b/arch/arm64/kernel/probes/kprobes.c > >> @@ -44,13 +44,10 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *); > >> static void __kprobes arch_prepare_ss_slot(struct kprobe *p) > >> { > >> kprobe_opcode_t *addr = p->ainsn.api.insn; > >> - void *addrs[] = {addr, addr + 1}; > >> - u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS}; > >> > >> /* prepare insn slot */ > >> - aarch64_insn_patch_text(addrs, insns, 2); > >> - > >> - flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE)); > >> + aarch64_insn_patch_text_nosync(addr, p->opcode); > >> + aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS); > >> > >> /* > >> * Needs restoring of return address after stepping xol. > >> -- > >> 2.17.1 > >> > > > > . > > -- > BR, > Liao, Chang
在 2022/9/25 9:21, Masami Hiramatsu (Google) 写道: > On Sat, 24 Sep 2022 09:52:28 +0800 > "liaochang (A)" <liaochang1@huawei.com> wrote: > >> >> >> 在 2022/9/23 20:39, Mark Rutland 写道: >>> On Fri, Sep 23, 2022 at 04:46:58PM +0800, Liao Chang wrote: >>>> Single-step slot would not be used until kprobe is enabled, that means >>>> no race condition occurs on it under SMP, hence it is safe to pacth ss >>>> slot without stopping machine. >>> >>> I think this is correct, but this depends on a couple of subtleties, >>> importantly: >>> >>> * That the I-cache maintenance for these instructions is complete *before* the >>> kprobe BRK is written (and aarch64_insn_patch_text_nosync() ensures this, but >>> just omits causing a Context-Synchronization-Event on all CPUS). >> >> So in order to guarantee the I-cache maintenance is observed on all CPUS, >> it needs to be followed by a explicit Context-Synchronization-Event, perhaps >> it is better to place ISB before kprobe BRK is written. >> >>> >>> * That the kprobe BRK results in an exception (and consequently a >>> Context-Synchronoization-Event), which ensures that the CPU will fetch the >>> single-step slot instructions *after* this, ensuring that the new >>> instructions are used. >> >> Yes, because of single-step slot is installed int the BRK execption handler, >> so it is not necessary to generate Context-Synchronization-Event via ISB mentioned above... > > Can you update the patch including above as comments in the code? > Maybe you also have to ensure it on other patches too. OK,i will add these comments in the code. Thanks. > > Thank you, > >> >> Thanks. >> >>> >>> It would be good if we could call that out explicitly. >>> >>> Thanks, >>> Mark. >>> >>>> Since I and D caches are coherent within single-step slot from >>>> aarch64_insn_patch_text_nosync(), hence no need to do it again via >>>> flush_icache_range(). >>>> >>>> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> >>>> Signed-off-by: Liao Chang <liaochang1@huawei.com> >>>> --- >>>> arch/arm64/kernel/probes/kprobes.c | 7 ++----- >>>> 1 file changed, 2 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c >>>> index d1d182320245..29b98bc12833 100644 >>>> --- a/arch/arm64/kernel/probes/kprobes.c >>>> +++ b/arch/arm64/kernel/probes/kprobes.c >>>> @@ -44,13 +44,10 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *); >>>> static void __kprobes arch_prepare_ss_slot(struct kprobe *p) >>>> { >>>> kprobe_opcode_t *addr = p->ainsn.api.insn; >>>> - void *addrs[] = {addr, addr + 1}; >>>> - u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS}; >>>> >>>> /* prepare insn slot */ >>>> - aarch64_insn_patch_text(addrs, insns, 2); >>>> - >>>> - flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE)); >>>> + aarch64_insn_patch_text_nosync(addr, p->opcode); >>>> + aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS); >>>> >>>> /* >>>> * Needs restoring of return address after stepping xol. >>>> -- >>>> 2.17.1 >>>> >>> >>> . >> >> -- >> BR, >> Liao, Chang > >
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c index d1d182320245..29b98bc12833 100644 --- a/arch/arm64/kernel/probes/kprobes.c +++ b/arch/arm64/kernel/probes/kprobes.c @@ -44,13 +44,10 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *); static void __kprobes arch_prepare_ss_slot(struct kprobe *p) { kprobe_opcode_t *addr = p->ainsn.api.insn; - void *addrs[] = {addr, addr + 1}; - u32 insns[] = {p->opcode, BRK64_OPCODE_KPROBES_SS}; /* prepare insn slot */ - aarch64_insn_patch_text(addrs, insns, 2); - - flush_icache_range((uintptr_t)addr, (uintptr_t)(addr + MAX_INSN_SIZE)); + aarch64_insn_patch_text_nosync(addr, p->opcode); + aarch64_insn_patch_text_nosync(addr + 1, BRK64_OPCODE_KPROBES_SS); /* * Needs restoring of return address after stepping xol.