Message ID | 3f6d302a2068d9e357efda2d92c8da99a0f2d0b2.1669278892.git.christophe.leroy@csgroup.eu (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | powerpc/bpf: Only update ldimm64 during extra pass when it is an address | expand |
Christophe Leroy wrote: > ldimm64 is not only used for loading function addresses, and That's probably true today, but I worry that that can change upstream and we may not notice at all. > the NOPs added for padding are impacting performance, so avoid > them when not necessary. > > On QEMU mac99, with the patch: > > test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 167436810 PASS > test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 170702940 PASS > > Without the patch: > > test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 173012360 PASS > test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 176424090 PASS > > That's a 3.5% performance improvement. A better approach would be to do a full JIT during the extra pass. That's what most other architectures do today. And, as long as we can ensure that the JIT'ed program size can never increase during the extra pass, we should be ok to do a single extra pass. - Naveen
Le 24/11/2022 à 11:13, Naveen N. Rao a écrit : > Christophe Leroy wrote: >> ldimm64 is not only used for loading function addresses, and > > That's probably true today, but I worry that that can change upstream > and we may not notice at all. Not sure what you mean. Today POWERPC considers that ldimm64 is _always_ loading a function address whereas upstream BPF considers that ldimm64 is a function only when it is flagged BPF_PSEUDO_FUNC. In what direction could that change in the future ? For me if they change that it becomes an API change. Christophe > >> the NOPs added for padding are impacting performance, so avoid >> them when not necessary. >> >> On QEMU mac99, with the patch: >> >> test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 >> 167436810 PASS >> test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 >> 170702940 PASS >> >> Without the patch: >> >> test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 >> 173012360 PASS >> test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 >> 176424090 PASS >> >> That's a 3.5% performance improvement. > > A better approach would be to do a full JIT during the extra pass. > That's what most other architectures do today. And, as long as we can > ensure that the JIT'ed program size can never increase during the extra > pass, we should be ok to do a single extra pass. > > > - Naveen
Christophe Leroy wrote: > > > Le 24/11/2022 à 11:13, Naveen N. Rao a écrit : >> Christophe Leroy wrote: >>> ldimm64 is not only used for loading function addresses, and >> >> That's probably true today, but I worry that that can change upstream >> and we may not notice at all. > > Not sure what you mean. > > Today POWERPC considers that ldimm64 is _always_ loading a function > address whereas upstream BPF considers that ldimm64 is a function only > when it is flagged BPF_PSEUDO_FUNC. Not sure why you think we consider ldimm64 to always be loading a function address. Perhaps it is due to the poorly chosen variable name func_addr in bpf_jit_fixup_addresses(), or due to the fact that we always update the JIT code for ldimm64. In any case, we simply overwrite imm64 load instructions to ensure we are using the updated address. > > In what direction could that change in the future ? > > For me if they change that it becomes an API change. More of an extension, which is exactly what we had when BPF_PSEUDO_FUNC was introduced. Took us nearly a year before we noticed. Because we do not do a full JIT during the extra pass today like other architectures, we are the exception - there is always the risk of bpf core changes breaking our JIT. So, I still think it is better if we do a full JIT during extra pass. - Naveen
Le 24/11/2022 à 14:49, Naveen N. Rao a écrit : > Christophe Leroy wrote: >> >> >> Le 24/11/2022 à 11:13, Naveen N. Rao a écrit : >>> Christophe Leroy wrote: >>>> ldimm64 is not only used for loading function addresses, and >>> >>> That's probably true today, but I worry that that can change upstream >>> and we may not notice at all. >> >> Not sure what you mean. >> >> Today POWERPC considers that ldimm64 is _always_ loading a function >> address whereas upstream BPF considers that ldimm64 is a function only >> when it is flagged BPF_PSEUDO_FUNC. > > Not sure why you think we consider ldimm64 to always be loading a > function address. Perhaps it is due to the poorly chosen variable name > func_addr in bpf_jit_fixup_addresses(), or due to the fact that we > always update the JIT code for ldimm64. In any case, we simply overwrite > imm64 load instructions to ensure we are using the updated address. Well that's the padding which make me think that. When ldimm64 is used with immediate value, it won't change from one pass to the other. We have the need for the padding only because it may contain addresses that will change from one pass to another. > >> >> In what direction could that change in the future ? >> >> For me if they change that it becomes an API change. > > More of an extension, which is exactly what we had when BPF_PSEUDO_FUNC > was introduced. Took us nearly a year before we noticed. > > Because we do not do a full JIT during the extra pass today like other > architectures, we are the exception - there is always the risk of bpf > core changes breaking our JIT. So, I still think it is better if we do a > full JIT during extra pass. > I like the idea of a full JIT during extra passes and will start looking at it. Will it also allow us to revert your commit fab07611fb2e ("powerpc32/bpf: Fix codegen for bpf-to-bpf calls") ? Thanks Christophe
Christophe Leroy wrote: > > > Le 24/11/2022 à 14:49, Naveen N. Rao a écrit : >> Christophe Leroy wrote: >>> >>> >>> Le 24/11/2022 à 11:13, Naveen N. Rao a écrit : >>>> Christophe Leroy wrote: >>> >>> In what direction could that change in the future ? >>> >>> For me if they change that it becomes an API change. >> >> More of an extension, which is exactly what we had when BPF_PSEUDO_FUNC >> was introduced. Took us nearly a year before we noticed. >> >> Because we do not do a full JIT during the extra pass today like other >> architectures, we are the exception - there is always the risk of bpf >> core changes breaking our JIT. So, I still think it is better if we do a >> full JIT during extra pass. >> > > I like the idea of a full JIT during extra passes and will start looking > at it. > > Will it also allow us to revert your commit fab07611fb2e > ("powerpc32/bpf: Fix codegen for bpf-to-bpf calls") ? Not entirely. We still need those extra nops during the initial JIT so that we can estimate the maximum prog size. During extra pass, we can only emit the necessary instructions and skip extra nops. We may need to do two passes during extra_pass to adjust the branch targets though. Thanks, Naveen
Le 25/11/2022 à 06:38, Naveen N. Rao a écrit : > Christophe Leroy wrote: >> >> >> Le 24/11/2022 à 14:49, Naveen N. Rao a écrit : >>> Christophe Leroy wrote: >>>> >>>> >>>> Le 24/11/2022 à 11:13, Naveen N. Rao a écrit : >>>>> Christophe Leroy wrote: >>>> >>>> In what direction could that change in the future ? >>>> >>>> For me if they change that it becomes an API change. >>> >>> More of an extension, which is exactly what we had when >>> BPF_PSEUDO_FUNC was introduced. Took us nearly a year before we noticed. >>> >>> Because we do not do a full JIT during the extra pass today like >>> other architectures, we are the exception - there is always the risk >>> of bpf core changes breaking our JIT. So, I still think it is better >>> if we do a full JIT during extra pass. >>> >> >> I like the idea of a full JIT during extra passes and will start >> looking at it. >> >> Will it also allow us to revert your commit fab07611fb2e >> ("powerpc32/bpf: Fix codegen for bpf-to-bpf calls") ? > > Not entirely. We still need those extra nops during the initial JIT so > that we can estimate the maximum prog size. During extra pass, we can > only emit the necessary instructions and skip extra nops. We may need to > do two passes during extra_pass to adjust the branch targets though. > Before your change, the code was: if (image && rel < 0x2000000 && rel >= -0x2000000) { PPC_BL(func); } else { /* Load function address into r0 */ EMIT(PPC_RAW_LIS(_R0, IMM_H(func))); EMIT(PPC_RAW_ORI(_R0, _R0, IMM_L(func))); EMIT(PPC_RAW_MTCTR(_R0)); EMIT(PPC_RAW_BCTRL()); } During the initial pass, image is NULL so the else branch is taken. Christophe
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index 43e634126514..206b698723a3 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -68,7 +68,8 @@ static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image, * of the JITed sequence remains unchanged. */ ctx->idx = tmp_idx; - } else if (insn[i].code == (BPF_LD | BPF_IMM | BPF_DW)) { + } else if (insn[i].code == (BPF_LD | BPF_IMM | BPF_DW) && + insn[i].src_reg == BPF_PSEUDO_FUNC) { tmp_idx = ctx->idx; ctx->idx = addrs[i] / 4; #ifdef CONFIG_PPC32 diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index a379b0ce19ff..878f8a88d83e 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -960,8 +960,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * PPC_LI32(dst_reg_h, (u32)insn[i + 1].imm); PPC_LI32(dst_reg, (u32)insn[i].imm); /* padding to allow full 4 instructions for later patching */ - for (j = ctx->idx - tmp_idx; j < 4; j++) - EMIT(PPC_RAW_NOP()); + if (insn[i].src_reg == BPF_PSEUDO_FUNC) + for (j = ctx->idx - tmp_idx; j < 4; j++) + EMIT(PPC_RAW_NOP()); /* Adjust for two bpf instructions */ addrs[++i] = ctx->idx * 4; break; diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 29ee306d6302..af8bdb5553cd 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -938,8 +938,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * tmp_idx = ctx->idx; PPC_LI64(dst_reg, imm64); /* padding to allow full 5 instructions for later patching */ - for (j = ctx->idx - tmp_idx; j < 5; j++) - EMIT(PPC_RAW_NOP()); + if (insn[i].src_reg == BPF_PSEUDO_FUNC) + for (j = ctx->idx - tmp_idx; j < 5; j++) + EMIT(PPC_RAW_NOP()); /* Adjust for two bpf instructions */ addrs[++i] = ctx->idx * 4; break;
ldimm64 is not only used for loading function addresses, and the NOPs added for padding are impacting performance, so avoid them when not necessary. On QEMU mac99, with the patch: test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 167436810 PASS test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 170702940 PASS Without the patch: test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 173012360 PASS test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 176424090 PASS That's a 3.5% performance improvement. Fixes: f9320c49993c ("powerpc/bpf: Update ldimm64 instructions during extra pass") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> --- arch/powerpc/net/bpf_jit_comp.c | 3 ++- arch/powerpc/net/bpf_jit_comp32.c | 5 +++-- arch/powerpc/net/bpf_jit_comp64.c | 5 +++-- 3 files changed, 8 insertions(+), 5 deletions(-)