Message ID | Yxhd4EMKyoFoH9y4@hirez.programming.kicks-ass.net (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | BPF |
Headers | show |
Series | objtool,x86: Teach decode about LOOP* instructions | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
From: Peter Zijlstra > Sent: 07 September 2022 10:01 > > On Wed, Sep 07, 2022 at 09:06:45AM +0200, Peter Zijlstra wrote: > > On Wed, Sep 07, 2022 at 09:55:21AM +0900, Masami Hiramatsu (Google) wrote: > > > > > +/* Return the jump target address or 0 */ > > > +static inline unsigned long insn_get_branch_addr(struct insn *insn) > > > +{ > > > + switch (insn->opcode.bytes[0]) { > > > + case 0xe0: /* loopne */ > > > + case 0xe1: /* loope */ > > > + case 0xe2: /* loop */ > > > > Oh cute, objtool doesn't know about those, let me go add them. Do they ever appear in the kernel? They are so slow on Intel cpu that finding one ought to deemed a bug! Have you got jcxz (0xe3) in there? They are fast on both Intel and AMD cpus - so are usable. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, Sep 07, 2022 at 09:06:12AM +0000, David Laight wrote: > From: Peter Zijlstra > > Sent: 07 September 2022 10:01 > > > > On Wed, Sep 07, 2022 at 09:06:45AM +0200, Peter Zijlstra wrote: > > > On Wed, Sep 07, 2022 at 09:55:21AM +0900, Masami Hiramatsu (Google) wrote: > > > > > > > +/* Return the jump target address or 0 */ > > > > +static inline unsigned long insn_get_branch_addr(struct insn *insn) > > > > +{ > > > > + switch (insn->opcode.bytes[0]) { > > > > + case 0xe0: /* loopne */ > > > > + case 0xe1: /* loope */ > > > > + case 0xe2: /* loop */ > > > > > > Oh cute, objtool doesn't know about those, let me go add them. > > Do they ever appear in the kernel? No; that is, not on any of the random vmlinux.o images I checked this morning. Still, best to properly decode them anyway.
From: Peter Zijlstra > Sent: 07 September 2022 10:40 > > On Wed, Sep 07, 2022 at 09:06:12AM +0000, David Laight wrote: > > From: Peter Zijlstra > > > Sent: 07 September 2022 10:01 > > > > > > On Wed, Sep 07, 2022 at 09:06:45AM +0200, Peter Zijlstra wrote: > > > > On Wed, Sep 07, 2022 at 09:55:21AM +0900, Masami Hiramatsu (Google) wrote: > > > > > > > > > +/* Return the jump target address or 0 */ > > > > > +static inline unsigned long insn_get_branch_addr(struct insn *insn) > > > > > +{ > > > > > + switch (insn->opcode.bytes[0]) { > > > > > + case 0xe0: /* loopne */ > > > > > + case 0xe1: /* loope */ > > > > > + case 0xe2: /* loop */ > > > > > > > > Oh cute, objtool doesn't know about those, let me go add them. > > > > Do they ever appear in the kernel? > > No; that is, not on any of the random vmlinux.o images I checked this > morning. > > Still, best to properly decode them anyway. It is annoying that cpu with adox/adcx have slow loop. You really want to be able to do: 1: adox ... adcx ... loop 1b That would never run with one iteration/clock. But unrolling once would probably be enough. What you can do (and gives the fastest IPcsum loop) is: 1: jcxz 2f .... lea %rcx,... jmp 1b 2: The extra instructions mean that needs unrolling 4 times. I've got over 12 bytes/clock that way. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c index c260006106be..1c253b4b7ce0 100644 --- a/tools/objtool/arch/x86/decode.c +++ b/tools/objtool/arch/x86/decode.c @@ -635,6 +635,12 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec *type = INSN_CONTEXT_SWITCH; break; + case 0xe0: /* loopne */ + case 0xe1: /* loope */ + case 0xe2: /* loop */ + *type = INSN_JUMP_CONDITIONAL; + break; + case 0xe8: *type = INSN_CALL; /*