Message ID | 20230928194818.261163-2-hbathini@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | powerpc/bpf: use BPF prog pack allocator | expand |
Context | Check | Description |
---|---|---|
bpf/vmtest-bpf-next-PR | success | PR summary |
bpf/vmtest-bpf-next-VM_Test-0 | success | Logs for ShellCheck |
bpf/vmtest-bpf-next-VM_Test-5 | success | Logs for set-matrix |
bpf/vmtest-bpf-next-VM_Test-1 | success | Logs for build for aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-2 | success | Logs for build for s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-3 | success | Logs for build for x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-4 | success | Logs for build for x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-6 | success | Logs for test_maps on aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-8 | success | Logs for test_maps on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-9 | success | Logs for test_maps on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-20 | success | Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-19 | success | Logs for test_progs_no_alu32_parallel on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-22 | success | Logs for test_progs_parallel on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-23 | success | Logs for test_progs_parallel on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-21 | success | Logs for test_progs_parallel on aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-24 | success | Logs for test_verifier on aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-25 | success | Logs for test_verifier on s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-26 | success | Logs for test_verifier on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-27 | success | Logs for test_verifier on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-13 | success | Logs for test_progs on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-28 | success | Logs for veristat |
bpf/vmtest-bpf-next-VM_Test-17 | success | Logs for test_progs_no_alu32 on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-16 | success | Logs for test_progs_no_alu32 on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-18 | success | Logs for test_progs_no_alu32_parallel on aarch64 with gcc |
netdev/tree_selection | success | Not a local patch, async |
bpf/vmtest-bpf-next-VM_Test-10 | success | Logs for test_progs on aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-12 | success | Logs for test_progs on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-14 | success | Logs for test_progs_no_alu32 on aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-15 | success | Logs for test_progs_no_alu32 on s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-11 | success | Logs for test_progs on s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-7 | success | Logs for test_maps on s390x with gcc |
On Thu, Sep 28, 2023 at 12:48 PM Hari Bathini <hbathini@linux.ibm.com> wrote: > > patch_instruction() entails setting up pte, patching the instruction, > clearing the pte and flushing the tlb. If multiple instructions need > to be patched, every instruction would have to go through the above > drill unnecessarily. Instead, introduce function patch_instructions() > that sets up the pte, clears the pte and flushes the tlb only once per > page range of instructions to be patched. This adds a slight overhead > to patch_instruction() call while improving the patching time for > scenarios where more than one instruction needs to be patched. > > Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Song Liu <song@kernel.org> With a nit below. [...] > +/* > + * A page is mapped and instructions that fit the page are patched. > + * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below. > + */ > +static int __do_patch_instructions_mm(u32 *addr, void *code, size_t len, bool repeat_instr) > { > int err; > u32 *patch_addr; > @@ -307,11 +336,15 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) > > orig_mm = start_using_temp_mm(patching_mm); > > - err = __patch_instruction(addr, instr, patch_addr); > + /* Single instruction case. */ > + if (len == 0) { > + err = __patch_instruction(addr, *(ppc_inst_t *)code, patch_addr); len == 0 for single instruction is a little weird to me. How about we just use len == 4 or 8 depending on the instruction to patch? Thanks, Song [...]
Le 28/09/2023 à 21:48, Hari Bathini a écrit : > patch_instruction() entails setting up pte, patching the instruction, > clearing the pte and flushing the tlb. If multiple instructions need > to be patched, every instruction would have to go through the above > drill unnecessarily. Instead, introduce function patch_instructions() > that sets up the pte, clears the pte and flushes the tlb only once per > page range of instructions to be patched. This adds a slight overhead > to patch_instruction() call while improving the patching time for > scenarios where more than one instruction needs to be patched. On my powerpc8xx, this patch leads to an increase of about 8% of the time needed to activate ftrace function tracer. The problem is it complexifies patch_instruction(). Before your patch: 00000234 <patch_instruction>: 234: 48 00 00 6c b 2a0 <patch_instruction+0x6c> 238: 7c e0 00 a6 mfmsr r7 23c: 7c 51 13 a6 mtspr 81,r2 240: 3d 40 00 00 lis r10,0 242: R_PPC_ADDR16_HA .data 244: 39 4a 00 00 addi r10,r10,0 246: R_PPC_ADDR16_LO .data 248: 7c 69 1b 78 mr r9,r3 24c: 3d 29 40 00 addis r9,r9,16384 250: 81 0a 00 08 lwz r8,8(r10) 254: 55 29 00 26 clrrwi r9,r9,12 258: 81 4a 00 04 lwz r10,4(r10) 25c: 61 29 01 25 ori r9,r9,293 260: 91 28 00 00 stw r9,0(r8) 264: 55 49 00 26 clrrwi r9,r10,12 268: 50 6a 05 3e rlwimi r10,r3,0,20,31 26c: 90 8a 00 00 stw r4,0(r10) 270: 7c 00 50 6c dcbst 0,r10 274: 7c 00 04 ac hwsync 278: 7c 00 1f ac icbi 0,r3 27c: 7c 00 04 ac hwsync 280: 4c 00 01 2c isync 284: 38 60 00 00 li r3,0 288: 39 40 00 00 li r10,0 28c: 91 48 00 00 stw r10,0(r8) 290: 7c 00 4a 64 tlbie r9,r0 294: 7c 00 04 ac hwsync 298: 7c e0 01 24 mtmsr r7 29c: 4e 80 00 20 blr 2a0: 90 83 00 00 stw r4,0(r3) 2a4: 7c 00 18 6c dcbst 0,r3 2a8: 7c 00 04 ac hwsync 2ac: 7c 00 1f ac icbi 0,r3 2b0: 7c 00 04 ac hwsync 2b4: 4c 00 01 2c isync 2b8: 38 60 00 00 li r3,0 2bc: 4e 80 00 20 blr 2c0: 38 60 ff ff li r3,-1 2c4: 4b ff ff c4 b 288 <patch_instruction+0x54> 2c8: 38 60 ff ff li r3,-1 2cc: 4e 80 00 20 blr After you patch: 0000020c <__do_patch_instructions>: 20c: 94 21 ff e0 stwu r1,-32(r1) 210: 3d 40 00 00 lis r10,0 212: R_PPC_ADDR16_HA .data 214: 93 81 00 10 stw r28,16(r1) 218: 93 c1 00 18 stw r30,24(r1) 21c: 93 a1 00 14 stw r29,20(r1) 220: 93 e1 00 1c stw r31,28(r1) 224: 39 4a 00 00 addi r10,r10,0 226: R_PPC_ADDR16_LO .data 228: 7c 69 1b 78 mr r9,r3 22c: 7c be 2b 79 mr. r30,r5 230: 3d 29 40 00 addis r9,r9,16384 234: 83 ea 00 04 lwz r31,4(r10) 238: 83 aa 00 08 lwz r29,8(r10) 23c: 55 29 00 26 clrrwi r9,r9,12 240: 61 29 01 25 ori r9,r9,293 244: 57 fc 00 26 clrrwi r28,r31,12 248: 91 3d 00 00 stw r9,0(r29) 24c: 50 7f 05 3e rlwimi r31,r3,0,20,31 250: 40 82 00 4c bne 29c <__do_patch_instructions+0x90> 254: 81 24 00 00 lwz r9,0(r4) 258: 91 3f 00 00 stw r9,0(r31) 25c: 7c 00 f8 6c dcbst 0,r31 260: 7c 00 04 ac hwsync 264: 7c 00 1f ac icbi 0,r3 268: 7c 00 04 ac hwsync 26c: 4c 00 01 2c isync 270: 38 60 00 00 li r3,0 274: 39 20 00 00 li r9,0 278: 91 3d 00 00 stw r9,0(r29) 27c: 7c 00 e2 64 tlbie r28,r0 280: 7c 00 04 ac hwsync 284: 83 81 00 10 lwz r28,16(r1) 288: 83 a1 00 14 lwz r29,20(r1) 28c: 83 c1 00 18 lwz r30,24(r1) 290: 83 e1 00 1c lwz r31,28(r1) 294: 38 21 00 20 addi r1,r1,32 298: 4e 80 00 20 blr 29c: 2c 06 00 00 cmpwi r6,0 2a0: 7c 08 02 a6 mflr r0 2a4: 90 01 00 24 stw r0,36(r1) 2a8: 40 82 00 24 bne 2cc <__do_patch_instructions+0xc0> 2ac: 7f e3 fb 78 mr r3,r31 2b0: 48 00 00 01 bl 2b0 <__do_patch_instructions+0xa4> 2b0: R_PPC_REL24 memcpy 2b4: 7c 9f f2 14 add r4,r31,r30 2b8: 7f e3 fb 78 mr r3,r31 2bc: 48 00 00 01 bl 2bc <__do_patch_instructions+0xb0> 2bc: R_PPC_REL24 flush_icache_range 2c0: 80 01 00 24 lwz r0,36(r1) 2c4: 7c 08 03 a6 mtlr r0 2c8: 4b ff ff a8 b 270 <__do_patch_instructions+0x64> 2cc: 80 84 00 00 lwz r4,0(r4) 2d0: 57 c5 f0 be srwi r5,r30,2 2d4: 7f e3 fb 78 mr r3,r31 2d8: 48 00 00 01 bl 2d8 <__do_patch_instructions+0xcc> 2d8: R_PPC_REL24 memset32 2dc: 4b ff ff d8 b 2b4 <__do_patch_instructions+0xa8> 2e0: 38 60 ff ff li r3,-1 2e4: 4b ff ff 90 b 274 <__do_patch_instructions+0x68> ... 00000310 <patch_instruction>: 310: 94 21 ff e0 stwu r1,-32(r1) 314: 90 81 00 08 stw r4,8(r1) 318: 48 00 00 40 b 358 <patch_instruction+0x48> 31c: 7c 08 02 a6 mflr r0 320: 90 01 00 24 stw r0,36(r1) 324: 93 e1 00 1c stw r31,28(r1) 328: 7f e0 00 a6 mfmsr r31 32c: 7c 51 13 a6 mtspr 81,r2 330: 38 c0 00 00 li r6,0 334: 38 81 00 08 addi r4,r1,8 338: 38 a0 00 00 li r5,0 33c: 4b ff fe d1 bl 20c <__do_patch_instructions> 340: 7f e0 01 24 mtmsr r31 344: 80 01 00 24 lwz r0,36(r1) 348: 83 e1 00 1c lwz r31,28(r1) 34c: 7c 08 03 a6 mtlr r0 350: 38 21 00 20 addi r1,r1,32 354: 4e 80 00 20 blr 358: 81 21 00 08 lwz r9,8(r1) 35c: 91 23 00 00 stw r9,0(r3) 360: 7c 00 18 6c dcbst 0,r3 364: 7c 00 04 ac hwsync 368: 7c 00 1f ac icbi 0,r3 36c: 7c 00 04 ac hwsync 370: 4c 00 01 2c isync 374: 38 60 00 00 li r3,0 378: 4b ff ff d8 b 350 <patch_instruction+0x40> 37c: 38 60 ff ff li r3,-1 380: 4b ff ff d0 b 350 <patch_instruction+0x40> Christophe > > Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> > --- > arch/powerpc/include/asm/code-patching.h | 1 + > arch/powerpc/lib/code-patching.c | 93 +++++++++++++++++++++--- > 2 files changed, 85 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h > index 3f881548fb61..43a4aedfa703 100644 > --- a/arch/powerpc/include/asm/code-patching.h > +++ b/arch/powerpc/include/asm/code-patching.h > @@ -74,6 +74,7 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr, > int patch_branch(u32 *addr, unsigned long target, int flags); > int patch_instruction(u32 *addr, ppc_inst_t instr); > int raw_patch_instruction(u32 *addr, ppc_inst_t instr); > +int patch_instructions(void *addr, void *code, size_t len, bool repeat_instr); > > static inline unsigned long patch_site_addr(s32 *site) > { > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c > index b00112d7ad46..4ff002bc41f6 100644 > --- a/arch/powerpc/lib/code-patching.c > +++ b/arch/powerpc/lib/code-patching.c > @@ -278,7 +278,36 @@ static void unmap_patch_area(unsigned long addr) > flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > } > > -static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) > +static int __patch_instructions(u32 *patch_addr, void *code, size_t len, bool repeat_instr) > +{ > + unsigned long start = (unsigned long)patch_addr; > + > + /* Repeat instruction */ > + if (repeat_instr) { > + ppc_inst_t instr = ppc_inst_read(code); > + > + if (ppc_inst_prefixed(instr)) { > + u64 val = ppc_inst_as_ulong(instr); > + > + memset64((uint64_t *)patch_addr, val, len / 8); > + } else { > + u32 val = ppc_inst_val(instr); > + > + memset32(patch_addr, val, len / 4); > + } > + } else > + memcpy(patch_addr, code, len); > + > + smp_wmb(); /* smp write barrier */ > + flush_icache_range(start, start + len); > + return 0; > +} > + > +/* > + * A page is mapped and instructions that fit the page are patched. > + * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below. > + */ > +static int __do_patch_instructions_mm(u32 *addr, void *code, size_t len, bool repeat_instr) > { > int err; > u32 *patch_addr; > @@ -307,11 +336,15 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) > > orig_mm = start_using_temp_mm(patching_mm); > > - err = __patch_instruction(addr, instr, patch_addr); > + /* Single instruction case. */ > + if (len == 0) { > + err = __patch_instruction(addr, *(ppc_inst_t *)code, patch_addr); > > - /* hwsync performed by __patch_instruction (sync) if successful */ > - if (err) > - mb(); /* sync */ > + /* hwsync performed by __patch_instruction (sync) if successful */ > + if (err) > + mb(); /* sync */ > + } else > + err = __patch_instructions(patch_addr, code, len, repeat_instr); > > /* context synchronisation performed by __patch_instruction (isync or exception) */ > stop_using_temp_mm(patching_mm, orig_mm); > @@ -328,7 +361,11 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) > return err; > } > > -static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) > +/* > + * A page is mapped and instructions that fit the page are patched. > + * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below. > + */ > +static int __do_patch_instructions(u32 *addr, void *code, size_t len, bool repeat_instr) > { > int err; > u32 *patch_addr; > @@ -345,7 +382,11 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) > if (radix_enabled()) > asm volatile("ptesync": : :"memory"); > > - err = __patch_instruction(addr, instr, patch_addr); > + /* Single instruction case. */ > + if (len == 0) > + err = __patch_instruction(addr, *(ppc_inst_t *)code, patch_addr); > + else > + err = __patch_instructions(patch_addr, code, len, repeat_instr); > > pte_clear(&init_mm, text_poke_addr, pte); > flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE); > @@ -369,15 +410,49 @@ int patch_instruction(u32 *addr, ppc_inst_t instr) > > local_irq_save(flags); > if (mm_patch_enabled()) > - err = __do_patch_instruction_mm(addr, instr); > + err = __do_patch_instructions_mm(addr, &instr, 0, false); > else > - err = __do_patch_instruction(addr, instr); > + err = __do_patch_instructions(addr, &instr, 0, false); > local_irq_restore(flags); > > return err; > } > NOKPROBE_SYMBOL(patch_instruction); > > +/* > + * Patch 'addr' with 'len' bytes of instructions from 'code'. > + * > + * If repeat_instr is true, the same instruction is filled for > + * 'len' bytes. > + */ > +int patch_instructions(void *addr, void *code, size_t len, bool repeat_instr) > +{ > + unsigned long flags; > + size_t plen; > + int err; > + > + while (len > 0) { > + plen = min_t(size_t, PAGE_SIZE - offset_in_page(addr), len); > + > + local_irq_save(flags); > + if (mm_patch_enabled()) > + err = __do_patch_instructions_mm(addr, code, plen, repeat_instr); > + else > + err = __do_patch_instructions(addr, code, plen, repeat_instr); > + local_irq_restore(flags); > + if (err) > + break; > + > + len -= plen; > + addr = addr + plen; > + if (!repeat_instr) > + code = code + plen; > + } > + > + return err; > +} > +NOKPROBE_SYMBOL(patch_instructions); > + > int patch_branch(u32 *addr, unsigned long target, int flags) > { > ppc_inst_t instr;
Hi Christophe, On 29/09/23 2:09 pm, Christophe Leroy wrote: > > > Le 28/09/2023 à 21:48, Hari Bathini a écrit : >> patch_instruction() entails setting up pte, patching the instruction, >> clearing the pte and flushing the tlb. If multiple instructions need >> to be patched, every instruction would have to go through the above >> drill unnecessarily. Instead, introduce function patch_instructions() >> that sets up the pte, clears the pte and flushes the tlb only once per >> page range of instructions to be patched. This adds a slight overhead >> to patch_instruction() call while improving the patching time for >> scenarios where more than one instruction needs to be patched. > > On my powerpc8xx, this patch leads to an increase of about 8% of the > time needed to activate ftrace function tracer. Interesting! My observation on ppc64le was somewhat different. With single cpu, average ticks were almost similar with and without the patch (~1580). I saw a performance degradation of less than 0.6% without vs with this patch to activate function tracer. Ticks to activate function tracer in 15 attempts without this patch (avg: 108734089): 106619626 111712292 111030404 111021344 111313530 106253773 107156175 106887038 107215379 108646636 108040287 108311770 107842343 106894310 112066423 Ticks to activate function tracer in 15 attempts with this patch (avg: 109328578): 109378357 108794095 108595381 107622142 110689418 107287276 107132093 112540481 111311830 112608265 102883923 112054554 111762570 109874309 107393979 I used the below patch for the experiment: diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index b00112d7ad4..0979d12d00c 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -19,6 +19,10 @@ #include <asm/page.h> #include <asm/code-patching.h> #include <asm/inst.h> +#include <asm/time.h> + +unsigned long patching_time; +unsigned long num_times; static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr) { @@ -353,7 +357,7 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) return err; } -int patch_instruction(u32 *addr, ppc_inst_t instr) +int ___patch_instruction(u32 *addr, ppc_inst_t instr) { int err; unsigned long flags; @@ -376,6 +380,19 @@ int patch_instruction(u32 *addr, ppc_inst_t instr) return err; } + +int patch_instruction(u32 *addr, ppc_inst_t instr) +{ + u64 start; + int err; + + start = get_tb(); + err = ___patch_instruction(addr, instr); + patching_time += (get_tb() - start); + num_times++; + + return err; +} NOKPROBE_SYMBOL(patch_instruction); int patch_branch(u32 *addr, unsigned long target, int flags) diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c index 1d4bc493b2f..f52694cfeab 100644 --- a/kernel/ksysfs.c +++ b/kernel/ksysfs.c @@ -35,6 +35,18 @@ static struct kobj_attribute _name##_attr = __ATTR_RO(_name) #define KERNEL_ATTR_RW(_name) \ static struct kobj_attribute _name##_attr = __ATTR_RW(_name) +unsigned long patch_avgtime; +extern unsigned long patching_time; +extern unsigned long num_times; + +static ssize_t patching_avgtime_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + patch_avgtime = patching_time / num_times; + return sysfs_emit(buf, "%lu\n", patch_avgtime); +} +KERNEL_ATTR_RO(patching_avgtime); + /* current uevent sequence number */ static ssize_t uevent_seqnum_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -250,6 +262,7 @@ struct kobject *kernel_kobj; EXPORT_SYMBOL_GPL(kernel_kobj); static struct attribute * kernel_attrs[] = { + &patching_avgtime_attr.attr, &fscaps_attr.attr, &uevent_seqnum_attr.attr, &cpu_byteorder_attr.attr, diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index abaaf516fca..5eb950bcab9 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -50,6 +50,7 @@ #include <linux/workqueue.h> #include <asm/setup.h> /* COMMAND_LINE_SIZE */ +#include <asm/time.h> #include "trace.h" #include "trace_output.h" @@ -6517,6 +6518,7 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) bool had_max_tr; #endif int ret = 0; + u64 start; mutex_lock(&trace_types_lock); @@ -6536,6 +6538,10 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) ret = -EINVAL; goto out; } + + pr_warn("Current tracer: %s, Changing to tracer: %s\n", + tr->current_trace->name, t->name); + start = get_tb(); if (t == tr->current_trace) goto out; @@ -6614,6 +6620,7 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) tr->current_trace->enabled++; trace_branch_enable(tr); out: + pr_warn("Time taken to enable tracer is %llu\n", (get_tb() - start)); mutex_unlock(&trace_types_lock); return ret; Thanks Hari
Thanks for the review, Song. On 29/09/23 2:38 am, Song Liu wrote: > On Thu, Sep 28, 2023 at 12:48 PM Hari Bathini <hbathini@linux.ibm.com> wrote: >> >> patch_instruction() entails setting up pte, patching the instruction, >> clearing the pte and flushing the tlb. If multiple instructions need >> to be patched, every instruction would have to go through the above >> drill unnecessarily. Instead, introduce function patch_instructions() >> that sets up the pte, clears the pte and flushes the tlb only once per >> page range of instructions to be patched. This adds a slight overhead >> to patch_instruction() call while improving the patching time for >> scenarios where more than one instruction needs to be patched. >> >> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> > > Acked-by: Song Liu <song@kernel.org> > > With a nit below. > > [...] >> +/* >> + * A page is mapped and instructions that fit the page are patched. >> + * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below. >> + */ >> +static int __do_patch_instructions_mm(u32 *addr, void *code, size_t len, bool repeat_instr) >> { >> int err; >> u32 *patch_addr; >> @@ -307,11 +336,15 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) >> >> orig_mm = start_using_temp_mm(patching_mm); >> >> - err = __patch_instruction(addr, instr, patch_addr); >> + /* Single instruction case. */ >> + if (len == 0) { >> + err = __patch_instruction(addr, *(ppc_inst_t *)code, patch_addr); > > len == 0 for single instruction is a little weird to me. How about we just use > len == 4 or 8 depending on the instruction to patch? Yeah. Looks a bit weird but it avoids the need to call ppc_inst_read() & ppc_inst_len(). A comment explaining why this weird check could have been better though.. Thanks Hari
Le 06/10/2023 à 18:22, Hari Bathini a écrit : > Hi Christophe, > > > On 29/09/23 2:09 pm, Christophe Leroy wrote: >> >> >> Le 28/09/2023 à 21:48, Hari Bathini a écrit : >>> patch_instruction() entails setting up pte, patching the instruction, >>> clearing the pte and flushing the tlb. If multiple instructions need >>> to be patched, every instruction would have to go through the above >>> drill unnecessarily. Instead, introduce function patch_instructions() >>> that sets up the pte, clears the pte and flushes the tlb only once per >>> page range of instructions to be patched. This adds a slight overhead >>> to patch_instruction() call while improving the patching time for >>> scenarios where more than one instruction needs to be patched. >> >> On my powerpc8xx, this patch leads to an increase of about 8% of the >> time needed to activate ftrace function tracer. > > Interesting! My observation on ppc64le was somewhat different. > With single cpu, average ticks were almost similar with and without > the patch (~1580). I saw a performance degradation of less than > 0.6% without vs with this patch to activate function tracer. > > Ticks to activate function tracer in 15 attempts without > this patch (avg: 108734089): > 106619626 > 111712292 > 111030404 > 111021344 > 111313530 > 106253773 > 107156175 > 106887038 > 107215379 > 108646636 > 108040287 > 108311770 > 107842343 > 106894310 > 112066423 > > Ticks to activate function tracer in 15 attempts with > this patch (avg: 109328578): > 109378357 > 108794095 > 108595381 > 107622142 > 110689418 > 107287276 > 107132093 > 112540481 > 111311830 > 112608265 > 102883923 > 112054554 > 111762570 > 109874309 > 107393979 > > I used the below patch for the experiment: I do the measurement at a higher level: diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 82010629cf88..3eea5b963bfa 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -167,7 +167,9 @@ void ftrace_replace_code(int enable) struct ftrace_rec_iter *iter; struct dyn_ftrace *rec; int ret = 0, update; + long t0; + t0 = mftb(); for_ftrace_rec_iter(iter) { rec = ftrace_rec_iter_record(iter); ip = rec->ip; @@ -206,6 +208,8 @@ void ftrace_replace_code(int enable) if (ret) goto out; } + t0 = mftb() - t0; + pr_err("%s: %ld\n", __func__, t0); out: if (ret) Without your patch I get: # echo function > /sys/kernel/debug/tracing/current_tracer [ 59.871176] ftrace-powerpc: ftrace_replace_code: 708099 # echo nop > /sys/kernel/debug/tracing/current_tracer [ 62.645293] ftrace-powerpc: ftrace_replace_code: 606449 [ 64.141428] ftrace-powerpc: ftrace_replace_code: 710117 [ 65.185562] ftrace-powerpc: ftrace_replace_code: 615069 [ 66.311363] ftrace-powerpc: ftrace_replace_code: 706974 [ 67.272770] ftrace-powerpc: ftrace_replace_code: 604744 [ 68.311403] ftrace-powerpc: ftrace_replace_code: 707498 [ 69.245960] ftrace-powerpc: ftrace_replace_code: 607089 [ 72.661438] ftrace-powerpc: ftrace_replace_code: 710228 [ 74.127413] ftrace-powerpc: ftrace_replace_code: 604846 [ 75.301460] ftrace-powerpc: ftrace_replace_code: 707982 [ 76.289665] ftrace-powerpc: ftrace_replace_code: 600860 [ 77.431054] ftrace-powerpc: ftrace_replace_code: 706672 [ 78.418618] ftrace-powerpc: ftrace_replace_code: 600879 [ 79.641558] ftrace-powerpc: ftrace_replace_code: 711074 [ 80.636932] ftrace-powerpc: ftrace_replace_code: 605791 [ 81.751581] ftrace-powerpc: ftrace_replace_code: 709184 [ 82.802525] ftrace-powerpc: ftrace_replace_code: 603046 [ 84.701412] ftrace-powerpc: ftrace_replace_code: 709887 [ 85.792599] ftrace-powerpc: ftrace_replace_code: 604801 With patch_instructions() patch applied I get: [ 150.677364] ftrace-powerpc: ftrace_replace_code: 753321 [ 154.201196] ftrace-powerpc: ftrace_replace_code: 657561 [ 157.027344] ftrace-powerpc: ftrace_replace_code: 753435 [ 158.692425] ftrace-powerpc: ftrace_replace_code: 652702 [ 162.137339] ftrace-powerpc: ftrace_replace_code: 753394 [ 163.207269] ftrace-powerpc: ftrace_replace_code: 650320 [ 165.387861] ftrace-powerpc: ftrace_replace_code: 756018 [ 166.458877] ftrace-powerpc: ftrace_replace_code: 650477 [ 167.617375] ftrace-powerpc: ftrace_replace_code: 753326 [ 168.596360] ftrace-powerpc: ftrace_replace_code: 647984 [ 169.737676] ftrace-powerpc: ftrace_replace_code: 756137 [ 170.743584] ftrace-powerpc: ftrace_replace_code: 652650 [ 171.907454] ftrace-powerpc: ftrace_replace_code: 754017 [ 172.943305] ftrace-powerpc: ftrace_replace_code: 650853 [ 174.187347] ftrace-powerpc: ftrace_replace_code: 753476 [ 175.811981] ftrace-powerpc: ftrace_replace_code: 650908 [ 177.007400] ftrace-powerpc: ftrace_replace_code: 753408 [ 177.993642] ftrace-powerpc: ftrace_replace_code: 651607 [ 179.157650] ftrace-powerpc: ftrace_replace_code: 755624 [ 180.141799] ftrace-powerpc: ftrace_replace_code: 652184 Christophe > > diff --git a/arch/powerpc/lib/code-patching.c > b/arch/powerpc/lib/code-patching.c > index b00112d7ad4..0979d12d00c 100644 > --- a/arch/powerpc/lib/code-patching.c > +++ b/arch/powerpc/lib/code-patching.c > @@ -19,6 +19,10 @@ > #include <asm/page.h> > #include <asm/code-patching.h> > #include <asm/inst.h> > +#include <asm/time.h> > + > +unsigned long patching_time; > +unsigned long num_times; > > static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 > *patch_addr) > { > @@ -353,7 +357,7 @@ static int __do_patch_instruction(u32 *addr, > ppc_inst_t instr) > return err; > } > > -int patch_instruction(u32 *addr, ppc_inst_t instr) > +int ___patch_instruction(u32 *addr, ppc_inst_t instr) > { > int err; > unsigned long flags; > @@ -376,6 +380,19 @@ int patch_instruction(u32 *addr, ppc_inst_t instr) > > return err; > } > + > +int patch_instruction(u32 *addr, ppc_inst_t instr) > +{ > + u64 start; > + int err; > + > + start = get_tb(); > + err = ___patch_instruction(addr, instr); > + patching_time += (get_tb() - start); > + num_times++; > + > + return err; > +} > NOKPROBE_SYMBOL(patch_instruction); > > int patch_branch(u32 *addr, unsigned long target, int flags) > diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c > index 1d4bc493b2f..f52694cfeab 100644 > --- a/kernel/ksysfs.c > +++ b/kernel/ksysfs.c > @@ -35,6 +35,18 @@ static struct kobj_attribute _name##_attr = > __ATTR_RO(_name) > #define KERNEL_ATTR_RW(_name) \ > static struct kobj_attribute _name##_attr = __ATTR_RW(_name) > > +unsigned long patch_avgtime; > +extern unsigned long patching_time; > +extern unsigned long num_times; > + > +static ssize_t patching_avgtime_show(struct kobject *kobj, > + struct kobj_attribute *attr, char *buf) > +{ > + patch_avgtime = patching_time / num_times; > + return sysfs_emit(buf, "%lu\n", patch_avgtime); > +} > +KERNEL_ATTR_RO(patching_avgtime); > + > /* current uevent sequence number */ > static ssize_t uevent_seqnum_show(struct kobject *kobj, > struct kobj_attribute *attr, char *buf) > @@ -250,6 +262,7 @@ struct kobject *kernel_kobj; > EXPORT_SYMBOL_GPL(kernel_kobj); > > static struct attribute * kernel_attrs[] = { > + &patching_avgtime_attr.attr, > &fscaps_attr.attr, > &uevent_seqnum_attr.attr, > &cpu_byteorder_attr.attr, > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index abaaf516fca..5eb950bcab9 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -50,6 +50,7 @@ > #include <linux/workqueue.h> > > #include <asm/setup.h> /* COMMAND_LINE_SIZE */ > +#include <asm/time.h> > > #include "trace.h" > #include "trace_output.h" > @@ -6517,6 +6518,7 @@ int tracing_set_tracer(struct trace_array *tr, > const char *buf) > bool had_max_tr; > #endif > int ret = 0; > + u64 start; > > mutex_lock(&trace_types_lock); > > @@ -6536,6 +6538,10 @@ int tracing_set_tracer(struct trace_array *tr, > const char *buf) > ret = -EINVAL; > goto out; > } > + > + pr_warn("Current tracer: %s, Changing to tracer: %s\n", > + tr->current_trace->name, t->name); > + start = get_tb(); > if (t == tr->current_trace) > goto out; > > @@ -6614,6 +6620,7 @@ int tracing_set_tracer(struct trace_array *tr, > const char *buf) > tr->current_trace->enabled++; > trace_branch_enable(tr); > out: > + pr_warn("Time taken to enable tracer is %llu\n", (get_tb() - start)); > mutex_unlock(&trace_types_lock); > > return ret; > > Thanks > Hari
Le 28/09/2023 à 21:48, Hari Bathini a écrit : > patch_instruction() entails setting up pte, patching the instruction, > clearing the pte and flushing the tlb. If multiple instructions need > to be patched, every instruction would have to go through the above > drill unnecessarily. Instead, introduce function patch_instructions() > that sets up the pte, clears the pte and flushes the tlb only once per > page range of instructions to be patched. This adds a slight overhead > to patch_instruction() call while improving the patching time for > scenarios where more than one instruction needs to be patched. Not a "slight" but a "significant" overhead on PPC32. Thinking about it once more I don't think it is a good idea to try and merge that into the existing code_patching logic which is really single instruction performance oriented. Anyway, comments below. > > Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> > --- > arch/powerpc/include/asm/code-patching.h | 1 + > arch/powerpc/lib/code-patching.c | 93 +++++++++++++++++++++--- > 2 files changed, 85 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h > index 3f881548fb61..43a4aedfa703 100644 > --- a/arch/powerpc/include/asm/code-patching.h > +++ b/arch/powerpc/include/asm/code-patching.h > @@ -74,6 +74,7 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr, > int patch_branch(u32 *addr, unsigned long target, int flags); > int patch_instruction(u32 *addr, ppc_inst_t instr); > int raw_patch_instruction(u32 *addr, ppc_inst_t instr); > +int patch_instructions(void *addr, void *code, size_t len, bool repeat_instr); I don't like void *, you can do to much nasty things with that. I think you want u32 * > > static inline unsigned long patch_site_addr(s32 *site) > { > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c > index b00112d7ad46..4ff002bc41f6 100644 > --- a/arch/powerpc/lib/code-patching.c > +++ b/arch/powerpc/lib/code-patching.c > @@ -278,7 +278,36 @@ static void unmap_patch_area(unsigned long addr) > flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > } > > -static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) > +static int __patch_instructions(u32 *patch_addr, void *code, size_t len, bool repeat_instr) > +{ > + unsigned long start = (unsigned long)patch_addr; > + > + /* Repeat instruction */ > + if (repeat_instr) { > + ppc_inst_t instr = ppc_inst_read(code); > + > + if (ppc_inst_prefixed(instr)) { > + u64 val = ppc_inst_as_ulong(instr); > + > + memset64((uint64_t *)patch_addr, val, len / 8); Use u64 instead of uint64_t. > + } else { > + u32 val = ppc_inst_val(instr); > + > + memset32(patch_addr, val, len / 4); > + } > + } else > + memcpy(patch_addr, code, len); Missing braces, see https://docs.kernel.org/process/coding-style.html#placing-braces-and-spaces > + > + smp_wmb(); /* smp write barrier */ > + flush_icache_range(start, start + len); > + return 0; > +} > + > +/* > + * A page is mapped and instructions that fit the page are patched. > + * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below. > + */ > +static int __do_patch_instructions_mm(u32 *addr, void *code, size_t len, bool repeat_instr) > { > int err; > u32 *patch_addr; > @@ -307,11 +336,15 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) > > orig_mm = start_using_temp_mm(patching_mm); > > - err = __patch_instruction(addr, instr, patch_addr); > + /* Single instruction case. */ > + if (len == 0) { > + err = __patch_instruction(addr, *(ppc_inst_t *)code, patch_addr); Take care, you can't convert u32 * to ppc_inst_t that way, you have to use ppc_inst_read() otherwise you'll get odd result with prefixed instructions depending on endianness. > > - /* hwsync performed by __patch_instruction (sync) if successful */ > - if (err) > - mb(); /* sync */ > + /* hwsync performed by __patch_instruction (sync) if successful */ > + if (err) > + mb(); /* sync */ Get this away, see my patch at https://patchwork.ozlabs.org/project/linuxppc-dev/patch/e88b154eaf2efd9ff177d472d3411dcdec8ff4f5.1696675567.git.christophe.leroy@csgroup.eu/ > + } else > + err = __patch_instructions(patch_addr, code, len, repeat_instr); > > /* context synchronisation performed by __patch_instruction (isync or exception) */ > stop_using_temp_mm(patching_mm, orig_mm); > @@ -328,7 +361,11 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) > return err; > } > > -static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) > +/* > + * A page is mapped and instructions that fit the page are patched. > + * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below. > + */ > +static int __do_patch_instructions(u32 *addr, void *code, size_t len, bool repeat_instr) > { > int err; > u32 *patch_addr; > @@ -345,7 +382,11 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) > if (radix_enabled()) > asm volatile("ptesync": : :"memory"); > > - err = __patch_instruction(addr, instr, patch_addr); > + /* Single instruction case. */ > + if (len == 0) > + err = __patch_instruction(addr, *(ppc_inst_t *)code, patch_addr); Same, use ppc_inst_read() instead of this nasty casting. > + else > + err = __patch_instructions(patch_addr, code, len, repeat_instr); > > pte_clear(&init_mm, text_poke_addr, pte); > flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE); > @@ -369,15 +410,49 @@ int patch_instruction(u32 *addr, ppc_inst_t instr) > > local_irq_save(flags); > if (mm_patch_enabled()) > - err = __do_patch_instruction_mm(addr, instr); > + err = __do_patch_instructions_mm(addr, &instr, 0, false); > else > - err = __do_patch_instruction(addr, instr); > + err = __do_patch_instructions(addr, &instr, 0, false); > local_irq_restore(flags); > > return err; > } > NOKPROBE_SYMBOL(patch_instruction); > > +/* > + * Patch 'addr' with 'len' bytes of instructions from 'code'. > + * > + * If repeat_instr is true, the same instruction is filled for > + * 'len' bytes. > + */ > +int patch_instructions(void *addr, void *code, size_t len, bool repeat_instr) I'd like to see code as a u32 * > +{ > + unsigned long flags; > + size_t plen; > + int err; Move those three variables inside the only block in which they are used. > + > + while (len > 0) { > + plen = min_t(size_t, PAGE_SIZE - offset_in_page(addr), len); > + > + local_irq_save(flags); > + if (mm_patch_enabled()) > + err = __do_patch_instructions_mm(addr, code, plen, repeat_instr); > + else > + err = __do_patch_instructions(addr, code, plen, repeat_instr); > + local_irq_restore(flags); > + if (err) > + break; replace by 'return err' > + > + len -= plen; > + addr = addr + plen; > + if (!repeat_instr) > + code = code + plen; > + } > + > + return err; If len is 0 err will be undefined. Is that expected ? Replace by return 0; > +} > +NOKPROBE_SYMBOL(patch_instructions); > + > int patch_branch(u32 *addr, unsigned long target, int flags) > { > ppc_inst_t instr;
Thanks for the review, Christophe. On 10/10/23 11:16 pm, Christophe Leroy wrote: > > > Le 28/09/2023 à 21:48, Hari Bathini a écrit : >> patch_instruction() entails setting up pte, patching the instruction, >> clearing the pte and flushing the tlb. If multiple instructions need >> to be patched, every instruction would have to go through the above >> drill unnecessarily. Instead, introduce function patch_instructions() >> that sets up the pte, clears the pte and flushes the tlb only once per >> page range of instructions to be patched. This adds a slight overhead >> to patch_instruction() call while improving the patching time for >> scenarios where more than one instruction needs to be patched. > > Not a "slight" but a "significant" overhead on PPC32. > > Thinking about it once more I don't think it is a good idea to try and > merge that into the existing code_patching logic which is really single > instruction performance oriented. > > Anyway, comments below. > >> >> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> >> --- >> arch/powerpc/include/asm/code-patching.h | 1 + >> arch/powerpc/lib/code-patching.c | 93 +++++++++++++++++++++--- >> 2 files changed, 85 insertions(+), 9 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h >> index 3f881548fb61..43a4aedfa703 100644 >> --- a/arch/powerpc/include/asm/code-patching.h >> +++ b/arch/powerpc/include/asm/code-patching.h >> @@ -74,6 +74,7 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr, >> int patch_branch(u32 *addr, unsigned long target, int flags); >> int patch_instruction(u32 *addr, ppc_inst_t instr); >> int raw_patch_instruction(u32 *addr, ppc_inst_t instr); >> +int patch_instructions(void *addr, void *code, size_t len, bool repeat_instr); > > I don't like void *, you can do to much nasty things with that. > I think you want u32 * > >> >> static inline unsigned long patch_site_addr(s32 *site) >> { >> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c >> index b00112d7ad46..4ff002bc41f6 100644 >> --- a/arch/powerpc/lib/code-patching.c >> +++ b/arch/powerpc/lib/code-patching.c >> @@ -278,7 +278,36 @@ static void unmap_patch_area(unsigned long addr) >> flush_tlb_kernel_range(addr, addr + PAGE_SIZE); >> } >> >> -static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) >> +static int __patch_instructions(u32 *patch_addr, void *code, size_t len, bool repeat_instr) >> +{ >> + unsigned long start = (unsigned long)patch_addr; >> + >> + /* Repeat instruction */ >> + if (repeat_instr) { >> + ppc_inst_t instr = ppc_inst_read(code); >> + >> + if (ppc_inst_prefixed(instr)) { >> + u64 val = ppc_inst_as_ulong(instr); >> + >> + memset64((uint64_t *)patch_addr, val, len / 8); > > Use u64 instead of uint64_t. > >> + } else { >> + u32 val = ppc_inst_val(instr); >> + >> + memset32(patch_addr, val, len / 4); >> + } >> + } else >> + memcpy(patch_addr, code, len); > > Missing braces, see > https://docs.kernel.org/process/coding-style.html#placing-braces-and-spaces > >> + >> + smp_wmb(); /* smp write barrier */ >> + flush_icache_range(start, start + len); >> + return 0; >> +} >> + >> +/* >> + * A page is mapped and instructions that fit the page are patched. >> + * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below. >> + */ >> +static int __do_patch_instructions_mm(u32 *addr, void *code, size_t len, bool repeat_instr) >> { >> int err; >> u32 *patch_addr; >> @@ -307,11 +336,15 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) >> >> orig_mm = start_using_temp_mm(patching_mm); >> >> - err = __patch_instruction(addr, instr, patch_addr); >> + /* Single instruction case. */ >> + if (len == 0) { >> + err = __patch_instruction(addr, *(ppc_inst_t *)code, patch_addr); > > Take care, you can't convert u32 * to ppc_inst_t that way, you have to > use ppc_inst_read() otherwise you'll get odd result with prefixed > instructions depending on endianness. > >> >> - /* hwsync performed by __patch_instruction (sync) if successful */ >> - if (err) >> - mb(); /* sync */ >> + /* hwsync performed by __patch_instruction (sync) if successful */ >> + if (err) >> + mb(); /* sync */ > > Get this away, see my patch at > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/e88b154eaf2efd9ff177d472d3411dcdec8ff4f5.1696675567.git.christophe.leroy@csgroup.eu/ > >> + } else >> + err = __patch_instructions(patch_addr, code, len, repeat_instr); >> >> /* context synchronisation performed by __patch_instruction (isync or exception) */ >> stop_using_temp_mm(patching_mm, orig_mm); >> @@ -328,7 +361,11 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) >> return err; >> } >> >> -static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) >> +/* >> + * A page is mapped and instructions that fit the page are patched. >> + * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below. >> + */ >> +static int __do_patch_instructions(u32 *addr, void *code, size_t len, bool repeat_instr) >> { >> int err; >> u32 *patch_addr; >> @@ -345,7 +382,11 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) >> if (radix_enabled()) >> asm volatile("ptesync": : :"memory"); >> >> - err = __patch_instruction(addr, instr, patch_addr); >> + /* Single instruction case. */ >> + if (len == 0) >> + err = __patch_instruction(addr, *(ppc_inst_t *)code, patch_addr); > > Same, use ppc_inst_read() instead of this nasty casting. > >> + else >> + err = __patch_instructions(patch_addr, code, len, repeat_instr); >> >> pte_clear(&init_mm, text_poke_addr, pte); >> flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE); >> @@ -369,15 +410,49 @@ int patch_instruction(u32 *addr, ppc_inst_t instr) >> >> local_irq_save(flags); >> if (mm_patch_enabled()) >> - err = __do_patch_instruction_mm(addr, instr); >> + err = __do_patch_instructions_mm(addr, &instr, 0, false); >> else >> - err = __do_patch_instruction(addr, instr); >> + err = __do_patch_instructions(addr, &instr, 0, false); >> local_irq_restore(flags); >> >> return err; >> } >> NOKPROBE_SYMBOL(patch_instruction); >> >> +/* >> + * Patch 'addr' with 'len' bytes of instructions from 'code'. >> + * >> + * If repeat_instr is true, the same instruction is filled for >> + * 'len' bytes. >> + */ >> +int patch_instructions(void *addr, void *code, size_t len, bool repeat_instr) > > I'd like to see code as a u32 * > >> +{ >> + unsigned long flags; >> + size_t plen; >> + int err; > > Move those three variables inside the only block in which they are used. > >> + >> + while (len > 0) { >> + plen = min_t(size_t, PAGE_SIZE - offset_in_page(addr), len); >> + >> + local_irq_save(flags); >> + if (mm_patch_enabled()) >> + err = __do_patch_instructions_mm(addr, code, plen, repeat_instr); >> + else >> + err = __do_patch_instructions(addr, code, plen, repeat_instr); >> + local_irq_restore(flags); >> + if (err) >> + break; > > replace by 'return err' > >> + >> + len -= plen; >> + addr = addr + plen; >> + if (!repeat_instr) >> + code = code + plen; >> + } >> + >> + return err; > > If len is 0 err will be undefined. Is that expected ? > > Replace by return 0; Posted v6 (https://lore.kernel.org/linuxppc-dev/20231012200310.235137-1-hbathini@linux.ibm.com/) with code path for patch_instruction() & patch_instriuctions() unmerged to avoid performance hit reported on ppc32. Also, addressed other review comments. Thanks Hari
diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 3f881548fb61..43a4aedfa703 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -74,6 +74,7 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr, int patch_branch(u32 *addr, unsigned long target, int flags); int patch_instruction(u32 *addr, ppc_inst_t instr); int raw_patch_instruction(u32 *addr, ppc_inst_t instr); +int patch_instructions(void *addr, void *code, size_t len, bool repeat_instr); static inline unsigned long patch_site_addr(s32 *site) { diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index b00112d7ad46..4ff002bc41f6 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -278,7 +278,36 @@ static void unmap_patch_area(unsigned long addr) flush_tlb_kernel_range(addr, addr + PAGE_SIZE); } -static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) +static int __patch_instructions(u32 *patch_addr, void *code, size_t len, bool repeat_instr) +{ + unsigned long start = (unsigned long)patch_addr; + + /* Repeat instruction */ + if (repeat_instr) { + ppc_inst_t instr = ppc_inst_read(code); + + if (ppc_inst_prefixed(instr)) { + u64 val = ppc_inst_as_ulong(instr); + + memset64((uint64_t *)patch_addr, val, len / 8); + } else { + u32 val = ppc_inst_val(instr); + + memset32(patch_addr, val, len / 4); + } + } else + memcpy(patch_addr, code, len); + + smp_wmb(); /* smp write barrier */ + flush_icache_range(start, start + len); + return 0; +} + +/* + * A page is mapped and instructions that fit the page are patched. + * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below. + */ +static int __do_patch_instructions_mm(u32 *addr, void *code, size_t len, bool repeat_instr) { int err; u32 *patch_addr; @@ -307,11 +336,15 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) orig_mm = start_using_temp_mm(patching_mm); - err = __patch_instruction(addr, instr, patch_addr); + /* Single instruction case. */ + if (len == 0) { + err = __patch_instruction(addr, *(ppc_inst_t *)code, patch_addr); - /* hwsync performed by __patch_instruction (sync) if successful */ - if (err) - mb(); /* sync */ + /* hwsync performed by __patch_instruction (sync) if successful */ + if (err) + mb(); /* sync */ + } else + err = __patch_instructions(patch_addr, code, len, repeat_instr); /* context synchronisation performed by __patch_instruction (isync or exception) */ stop_using_temp_mm(patching_mm, orig_mm); @@ -328,7 +361,11 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) return err; } -static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) +/* + * A page is mapped and instructions that fit the page are patched. + * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below. + */ +static int __do_patch_instructions(u32 *addr, void *code, size_t len, bool repeat_instr) { int err; u32 *patch_addr; @@ -345,7 +382,11 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) if (radix_enabled()) asm volatile("ptesync": : :"memory"); - err = __patch_instruction(addr, instr, patch_addr); + /* Single instruction case. */ + if (len == 0) + err = __patch_instruction(addr, *(ppc_inst_t *)code, patch_addr); + else + err = __patch_instructions(patch_addr, code, len, repeat_instr); pte_clear(&init_mm, text_poke_addr, pte); flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE); @@ -369,15 +410,49 @@ int patch_instruction(u32 *addr, ppc_inst_t instr) local_irq_save(flags); if (mm_patch_enabled()) - err = __do_patch_instruction_mm(addr, instr); + err = __do_patch_instructions_mm(addr, &instr, 0, false); else - err = __do_patch_instruction(addr, instr); + err = __do_patch_instructions(addr, &instr, 0, false); local_irq_restore(flags); return err; } NOKPROBE_SYMBOL(patch_instruction); +/* + * Patch 'addr' with 'len' bytes of instructions from 'code'. + * + * If repeat_instr is true, the same instruction is filled for + * 'len' bytes. + */ +int patch_instructions(void *addr, void *code, size_t len, bool repeat_instr) +{ + unsigned long flags; + size_t plen; + int err; + + while (len > 0) { + plen = min_t(size_t, PAGE_SIZE - offset_in_page(addr), len); + + local_irq_save(flags); + if (mm_patch_enabled()) + err = __do_patch_instructions_mm(addr, code, plen, repeat_instr); + else + err = __do_patch_instructions(addr, code, plen, repeat_instr); + local_irq_restore(flags); + if (err) + break; + + len -= plen; + addr = addr + plen; + if (!repeat_instr) + code = code + plen; + } + + return err; +} +NOKPROBE_SYMBOL(patch_instructions); + int patch_branch(u32 *addr, unsigned long target, int flags) { ppc_inst_t instr;
patch_instruction() entails setting up pte, patching the instruction, clearing the pte and flushing the tlb. If multiple instructions need to be patched, every instruction would have to go through the above drill unnecessarily. Instead, introduce function patch_instructions() that sets up the pte, clears the pte and flushes the tlb only once per page range of instructions to be patched. This adds a slight overhead to patch_instruction() call while improving the patching time for scenarios where more than one instruction needs to be patched. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> --- arch/powerpc/include/asm/code-patching.h | 1 + arch/powerpc/lib/code-patching.c | 93 +++++++++++++++++++++--- 2 files changed, 85 insertions(+), 9 deletions(-)