Message ID | 20250208151347.89708-1-david.laight.linux@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [next,1/1] fs: Mark get_sigset_argpack() __always_inline | expand |
On Sat, 8 Feb 2025 at 07:14, David Laight <david.laight.linux@gmail.com> wrote: > > Since the function is 'hot enough' to worry about avoiding the > overhead of copy_from_user() it must be worth forcing it to be > inlined. Hmm. gcc certainly inlines this one for me regardless. So it's either a gcc version issue (I have gcc-14.2.1), or it's some build configuration thing that makes the function big enough in your case that gcc decides not to inline things. Do you perhaps have some debugging options enabled? At that point, inlining is the least of all problems. > I'd guess that gcc is counting up the number of lines in the asm again. If that's the case, then we should probably make sure that the relevant inline asms are marked 'inline', which makes gcc estimate it to be minimal: https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html but it turns out that with the compiler bug workarounds we have that wrapper macro for this case: #define asm_goto_output(x...) asm volatile goto(x) which I guess we could just add the inline to. Bah. I think we should fix those things, but in the meantime your patch certainly isn't wrong either. So ack. And while looking at this, I note that I made get_sigset_argpack() do the masked user access thing, but didn't do the same pattern for get_compat_sigset_argpack() Linus
On Sat, 8 Feb 2025 10:53:38 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: ... > And while looking at this, I note that I made get_sigset_argpack() do > the masked user access thing, but didn't do the same pattern for > get_compat_sigset_argpack() That's in the next patch. I noticed you'd committed my cmov code. David
On Sat, 8 Feb 2025 10:53:38 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Sat, 8 Feb 2025 at 07:14, David Laight <david.laight.linux@gmail.com> wrote: > > > > Since the function is 'hot enough' to worry about avoiding the > > overhead of copy_from_user() it must be worth forcing it to be > > inlined. > > Hmm. gcc certainly inlines this one for me regardless. > > So it's either a gcc version issue (I have gcc-14.2.1), or it's some > build configuration thing that makes the function big enough in your > case that gcc decides not to inline things. Do you perhaps have some > debugging options enabled? At that point, inlining is the least of all > problems. gcc 12.2.0 (from debian) - so not THAT old. clang 18 does inline it. I've turned off pretty much everything (except page table separation). And there isn't much unexpected in the object code. Can the 'alternatives' be flipped so the .o doesn't contain loads of nops? It'd be nice to see the clac and lfence. David
On Sat, 8 Feb 2025 at 11:06, David Laight <david.laight.linux@gmail.com> wrote: > > Can the 'alternatives' be flipped so the .o doesn't contain loads of nops? Sadly, no. The instructions generate #UD if the CPU doesn't support SMAP. Now, arguably the alternatives *should* be fixed up before the first user space access and thus it would be safe to switch, but honestly, I don't want to risk some early code doing odd things. The potential pain would be too damn high. > It'd be nice to see the clac and lfence. Heh. I agree 100%, which is why my personal tree has a patch like this: -#define LOCK_PREFIX_HERE \ - ".pushsection .smp_locks,\"a\"\n" \ - ".balign 4\n" \ - ".long 671f - .\n" /* offset */ \ - ".popsection\n" \ - "671:" +#define LOCK_PREFIX_HERE ... -#define barrier_nospec() alternative("", "lfence", X86_FEATURE_LFENCE_RDTSC) +#define barrier_nospec() asm volatile("lfence":::"memory") ... -#define ASM_CLAC \ - ALTERNATIVE "", "clac", X86_FEATURE_SMAP - -#define ASM_STAC \ - ALTERNATIVE "", "stac", X86_FEATURE_SMAP +#define ASM_CLAC clac +#define ASM_STAC stac ... - /* Note: a barrier is implicit in alternative() */ - alternative("", "clac", X86_FEATURE_SMAP); + asm volatile("clac":::"memory"); ... - /* Note: a barrier is implicit in alternative() */ - alternative("", "stac", X86_FEATURE_SMAP); + asm volatile("stac":::"memory"); ... -#define ASM_BARRIER_NOSPEC ALTERNATIVE "", "lfence", X86_FEATURE_LFENCE_RDTSC +#define ASM_BARRIER_NOSPEC lfence but I've never made it a real usable patch for others. It would have to be some "modern CPU only" config variable, and nobody else has ever complained about this until now. Linus
On Sat, 8 Feb 2025 12:03:09 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Sat, 8 Feb 2025 at 11:06, David Laight <david.laight.linux@gmail.com> wrote: > > > > Can the 'alternatives' be flipped so the .o doesn't contain loads of nops? > > Sadly, no. The instructions generate #UD if the CPU doesn't support SMAP. > > Now, arguably the alternatives *should* be fixed up before the first > user space access and thus it would be safe to switch, but honestly, I > don't want to risk some early code doing odd things. The potential > pain would be too damn high. > > > It'd be nice to see the clac and lfence. > > Heh. I agree 100%, which is why my personal tree has a patch like this: > > -#define LOCK_PREFIX_HERE \ > - ".pushsection .smp_locks,\"a\"\n" \ > - ".balign 4\n" \ > - ".long 671f - .\n" /* offset */ \ > - ".popsection\n" \ > - "671:" > +#define LOCK_PREFIX_HERE That one is just hidden bloat. 40k on the kernel I'm building. Is it really worth the effort for single-cpu systems. I can't remember if you can still build a kernel with max-cpus of 1. But a minor change of removing lock prefix at compile time might make sense. > ... > -#define barrier_nospec() alternative("", "lfence", X86_FEATURE_LFENCE_RDTSC) > +#define barrier_nospec() asm volatile("lfence":::"memory") Isn't that testing the wrong feature bit? LFENCE_RDTSC indicates that lfence serialises rdtsc. lfence itself is predicated by XMM2 (see higher up the file) and is always present for 64bit. I'm not sure whether old 32bit cpu need a speculation barrier. But I'm sure I remember something from a very old (p-pro era?) document that mentioned avoiding interleaving data and code because the data might be something like atan and the cpu would have to wait for it to finish (and that would have been after a ret/jmp as well). So that might have to be rmb() even on old cpu. In any case it doesn't need to be an alternative on 64bit. > ... > -#define ASM_CLAC \ > - ALTERNATIVE "", "clac", X86_FEATURE_SMAP > - > -#define ASM_STAC \ > - ALTERNATIVE "", "stac", X86_FEATURE_SMAP > +#define ASM_CLAC clac > +#define ASM_STAC stac > ... > - /* Note: a barrier is implicit in alternative() */ s/barrier/memory clobber/ > - alternative("", "clac", X86_FEATURE_SMAP); > + asm volatile("clac":::"memory"); > ... > - /* Note: a barrier is implicit in alternative() */ > - alternative("", "stac", X86_FEATURE_SMAP); > + asm volatile("stac":::"memory"); Is there any reason why we get three 0x90 bytes instead of a three-byte nop? Indeed, since the flags can be changed there are a lot of choices. Compare %rsp imm8 (83:fc:xx) would probably not affect anything. It might even decode faster than three nop bytes. > ... > -#define ASM_BARRIER_NOSPEC ALTERNATIVE "", "lfence", X86_FEATURE_LFENCE_RDTSC > +#define ASM_BARRIER_NOSPEC lfence > > but I've never made it a real usable patch for others. It would have > to be some "modern CPU only" config variable, and nobody else has ever > complained about this until now. Defaulting to SMAP just requires a bit of bravery! After all (as you said) it ought to get patched out before userspace exists. It is still fairly new (Broadwell ix-5xxx). I'm sure a lot of people are still running Linux on Sandy bridge cpu. I've not used mine for a few years (since I stopped being an active NetBSD developer), but might dig it back out for testing if I retire. (Or just 'borrow' a slightly newer system from work.) David > > Linus
On Sat, Feb 08, 2025 at 11:05:00PM +0000, David Laight wrote: > Defaulting to SMAP just requires a bit of bravery! > After all (as you said) it ought to get patched out before userspace exists. > It is still fairly new (Broadwell ix-5xxx). > I'm sure a lot of people are still running Linux on Sandy bridge cpu. The Intel folk posting over at https://lore.kernel.org/oe-lkp/ have a bunch of old yellers they test on (and probably more they can dig up if needed) and will likely be happy to boot test a patch of the sort, just explicitly state you want a pre-SMAP CPU.
diff --git a/fs/select.c b/fs/select.c index 7da531b1cf6b..9d508114add1 100644 --- a/fs/select.c +++ b/fs/select.c @@ -771,8 +771,8 @@ struct sigset_argpack { size_t size; }; -static inline int get_sigset_argpack(struct sigset_argpack *to, - struct sigset_argpack __user *from) +static __always_inline int get_sigset_argpack(struct sigset_argpack *to, + struct sigset_argpack __user *from) { // the path is hot enough for overhead of copy_from_user() to matter if (from) { @@ -1343,8 +1343,8 @@ struct compat_sigset_argpack { compat_uptr_t p; compat_size_t size; }; -static inline int get_compat_sigset_argpack(struct compat_sigset_argpack *to, - struct compat_sigset_argpack __user *from) +static __always_inline int get_compat_sigset_argpack(struct compat_sigset_argpack *to, + struct compat_sigset_argpack __user *from) { if (from) { if (!user_read_access_begin(from, sizeof(*from)))
Since the function is 'hot enough' to worry about avoiding the overhead of copy_from_user() it must be worth forcing it to be inlined. Signed-off-by: David Laight <david.laight.linux@gmail.com> --- I'd guess that gcc is counting up the number of lines in the asm again. I'm not sure how to handle the indentation of the continuation lines. Lining up with the ( gives overlong lines. Elsewhere in this file one, two or four tabs are used. I picked two tabs - one of my favourite schemes. fs/select.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)