Message ID | 20220816032846.2579217-1-imagedong@tencent.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,v4] net: skb: prevent the split of kfree_skb_reason() by gcc | expand |
On Tue, Aug 16, 2022 at 5:29 AM <menglong8.dong@gmail.com> wrote: > > Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Hmm... Why did you add me as a reporter? > + * Optional: not supported by clang. > + * Optional: not supported by icc. Much better, thank you! Please add the links to GCC's docs, like in most attributes (some newer attributes may have been added without them -- I will fix that). In any case, no need to send a new version just for this for the moment, I would recommend waiting until others comment on whether they want `__optimize__` used here as the workaround. Cheers, Miguel
On Tue, Aug 16, 2022 at 5:02 PM Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> wrote: > > On Tue, Aug 16, 2022 at 5:29 AM <menglong8.dong@gmail.com> wrote: > > > > Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> > > Hmm... Why did you add me as a reporter? > Enn...because you pointed out my code's problem, just like what robot do. Shouldn't I? > > + * Optional: not supported by clang. > > + * Optional: not supported by icc. > > Much better, thank you! Please add the links to GCC's docs, like in > most attributes (some newer attributes may have been added without > them -- I will fix that). > > In any case, no need to send a new version just for this for the > moment, I would recommend waiting until others comment on whether they > want `__optimize__` used here as the workaround. > Okay, I'll wait longer before the next version. Thanks! Menglong Dong > Cheers, > Miguel
On Wed, Aug 17, 2022 at 4:20 AM Menglong Dong <menglong8.dong@gmail.com> wrote: > > Enn...because you pointed out my code's problem, just like what robot do. > Shouldn't I? Yeah, but that is just the usual review on a patch; I didn't report the original bug (that the commit message talks about). (I appreciate that you wanted to give credit, though :-) > Okay, I'll wait longer before the next version. > > Thanks! My pleasure! Cheers, Miguel
On Mon, Aug 15, 2022 at 8:29 PM <menglong8.dong@gmail.com> wrote: > > From: Menglong Dong <imagedong@tencent.com> > > Sometimes, gcc will optimize the function by spliting it to two or > more functions. In this case, kfree_skb_reason() is splited to > kfree_skb_reason and kfree_skb_reason.part.0. However, the > function/tracepoint trace_kfree_skb() in it needs the return address > of kfree_skb_reason(). Does the existing __noclone function attribute help at all here? If not, surely there's an attribute that's more precise than "disable most optimization outright." https://unix.stackexchange.com/questions/223013/function-symbol-gets-part-suffix-after-compilation https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-noclone-function-attribute Perhaps noipa might also work here? > > This split makes the call chains becomes: > kfree_skb_reason() -> kfree_skb_reason.part.0 -> trace_kfree_skb() > > which makes the return address that passed to trace_kfree_skb() be > kfree_skb(). > > Therefore, prevent this kind of optimization to kfree_skb_reason() by > making the optimize level to "O1". I think these should be better > method instead of this "O1", but I can't figure it out...... > > This optimization CAN happen, which depend on the behavior of gcc. > I'm not able to reproduce it in the latest kernel code, but it happens > in my kernel of version 5.4.119. Maybe the latest code already do someting > that prevent this happen? > > Signed-off-by: Menglong Dong <imagedong@tencent.com> > Reported-by: kernel test robot <lkp@intel.com> > Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> > --- > v4: > - move the definition of __nofnsplit to compiler_attributes.h > > v3: > - define __nofnsplit only for GCC > - add some document > > v2: > - replace 'optimize' with '__optimize__' in __nofnsplit, as Miguel Ojeda > advised. > --- > include/linux/compiler_attributes.h | 19 +++++++++++++++++++ > net/core/skbuff.c | 3 ++- > 2 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h > index 445e80517cab..968cbafa2421 100644 > --- a/include/linux/compiler_attributes.h > +++ b/include/linux/compiler_attributes.h > @@ -270,6 +270,25 @@ > */ > #define __noreturn __attribute__((__noreturn__)) > > +/* > + * Optional: not supported by clang. > + * Optional: not supported by icc. > + * > + * Prevent function from being splited to multiple part. As what the > + * document says in gcc/ipa-split.cc, single function will be splited > + * when necessary: > + * > + * https://github.com/gcc-mirror/gcc/blob/master/gcc/ipa-split.cc > + * > + * This optimization seems only take effect on O2 and O3 optimize level. > + * Therefore, make the optimize level to O1 to prevent this optimization. > + */ > +#if __has_attribute(__optimize__) > +# define __nofnsplit __attribute__((__optimize__("O1"))) > +#else > +# define __nofnsplit > +#endif > + > /* > * Optional: not supported by gcc. > * Optional: not supported by icc. > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 974bbbbe7138..ff9ccbc032b9 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -777,7 +777,8 @@ EXPORT_SYMBOL(__kfree_skb); > * hit zero. Meanwhile, pass the drop reason to 'kfree_skb' > * tracepoint. > */ > -void kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) > +void __nofnsplit > +kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) > { > if (!skb_unref(skb)) > return; > -- > 2.36.1 >
Hello, On Wed, Aug 17, 2022 at 11:54 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Mon, Aug 15, 2022 at 8:29 PM <menglong8.dong@gmail.com> wrote: > > > > From: Menglong Dong <imagedong@tencent.com> > > > > Sometimes, gcc will optimize the function by spliting it to two or > > more functions. In this case, kfree_skb_reason() is splited to > > kfree_skb_reason and kfree_skb_reason.part.0. However, the > > function/tracepoint trace_kfree_skb() in it needs the return address > > of kfree_skb_reason(). > > Does the existing __noclone function attribute help at all here? > > If not, surely there's an attribute that's more precise than "disable > most optimization outright." > > https://unix.stackexchange.com/questions/223013/function-symbol-gets-part-suffix-after-compilation > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-noclone-function-attribute > > Perhaps noipa might also work here? > In my testing, both 'noclone' and 'noipa' both work! As for the '-fdisable-ipa-fnsplit', it seems it's not supported by gcc, and I failed to find any documentation of it. I think that the '__noclone' is exactly what I needed! Just like what saied in this link: https://stackoverflow.com/questions/34086769/purpose-of-function-attribute-noclone I appreciate your advice, and it seems it's not needed to add new attributes to the compiler_attributes.h. Thanks! Menglong Dong
Hi! On Fri, Aug 19, 2022 at 12:31:44AM +0800, Menglong Dong wrote: > On Wed, Aug 17, 2022 at 11:54 PM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > Perhaps noipa might also work here? > > In my testing, both 'noclone' and 'noipa' both work! As for the > '-fdisable-ipa-fnsplit', it seems it's not supported by gcc, and I > failed to find any documentation of it. noipa is noinline+noclone+no_icf plus assorted not separately enablable things. There is no reason you would want to disable all inter-procedural optimisations here, so you don't need noipa. You need both noinline and no_icf if you want all calls to this to be actual function calls, and using this specific function name. If you don't have noinline some calls may go missing (which may be fine for how you use it). If you don't have no_icf the compiler may replace the call with a call to another function, if that does the same thing semantically. You may want to prevent that as well, depending on exactly what you have this for. Segher
On Tue, 16 Aug 2022 11:28:46 +0800 menglong8.dong@gmail.com wrote: > From: Menglong Dong <imagedong@tencent.com> > > Sometimes, gcc will optimize the function by spliting it to two or > more functions. In this case, kfree_skb_reason() is splited to > kfree_skb_reason and kfree_skb_reason.part.0. However, the > function/tracepoint trace_kfree_skb() in it needs the return address > of kfree_skb_reason(). > > This split makes the call chains becomes: > kfree_skb_reason() -> kfree_skb_reason.part.0 -> trace_kfree_skb() > > which makes the return address that passed to trace_kfree_skb() be > kfree_skb(). > > Therefore, prevent this kind of optimization to kfree_skb_reason() by > making the optimize level to "O1". I think these should be better > method instead of this "O1", but I can't figure it out...... > > This optimization CAN happen, which depend on the behavior of gcc. > I'm not able to reproduce it in the latest kernel code, but it happens > in my kernel of version 5.4.119. Maybe the latest code already do someting > that prevent this happen? > > Signed-off-by: Menglong Dong <imagedong@tencent.com> > Reported-by: kernel test robot <lkp@intel.com> > Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Sorry for a late and possibly off-topic chime in, is the compiler splitting it because it thinks that skb_unref() is going to return true? I don't think that's the likely case, so maybe we're better off wrapping that skb_unref() in unlikely()?
Hello, On Fri, Aug 19, 2022 at 1:00 AM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi! > > On Fri, Aug 19, 2022 at 12:31:44AM +0800, Menglong Dong wrote: > > On Wed, Aug 17, 2022 at 11:54 PM Nick Desaulniers > > <ndesaulniers@google.com> wrote: > > > Perhaps noipa might also work here? > > > > In my testing, both 'noclone' and 'noipa' both work! As for the > > '-fdisable-ipa-fnsplit', it seems it's not supported by gcc, and I > > failed to find any documentation of it. > > noipa is noinline+noclone+no_icf plus assorted not separately enablable > things. There is no reason you would want to disable all > inter-procedural optimisations here, so you don't need noipa. > > You need both noinline and no_icf if you want all calls to this to be > actual function calls, and using this specific function name. If you > don't have noinline some calls may go missing (which may be fine for > how you use it). If you don't have no_icf the compiler may replace the > call with a call to another function, if that does the same thing > semantically. You may want to prevent that as well, depending on > exactly what you have this for. > Thanks for your explanation about the usage of 'noinline' and 'no_icf'! I think 'noclone' seems enough in this case? As the function 'kfree_skb_reason' we talk about is a global function, I think that the compiler has no reason to make it inline, or be merged with another function. Meanwhile, I think that the functions which use '__builtin_return_address' should consider the optimization you mentioned above, and I'll have a check on them by the way. Thanks! Menglong Dong > > Segher
On Fri, Aug 19, 2022 at 1:09 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 16 Aug 2022 11:28:46 +0800 menglong8.dong@gmail.com wrote: > > From: Menglong Dong <imagedong@tencent.com> > > [...] > > Sorry for a late and possibly off-topic chime in, is the compiler > splitting it because it thinks that skb_unref() is going to return > true? I don't think that's the likely case, so maybe we're better > off wrapping that skb_unref() in unlikely()? I think your thought is totally right, considering the instruction that I disassembled: ffffffff819fea20 <kfree_skb_reason>: ffffffff819fea20: e8 cb 2c 40 00 call ffffffff81e016f0 <__fentry__> ffffffff819fea25: 48 85 ff test %rdi,%rdi ffffffff819fea28: 74 25 je ffffffff819fea4f <kfree_skb_reason+0x2f> ffffffff819fea2a: 8b 87 d4 00 00 00 mov 0xd4(%rdi),%eax /* this is just the instruction that compiled from skb_unref() */ ffffffff819fea30: 83 f8 01 cmp $0x1,%eax ffffffff819fea33: 75 0b jne ffffffff819fea40 <kfree_skb_reason+0x20> ffffffff819fea35: 55 push %rbp ffffffff819fea36: 48 89 e5 mov %rsp,%rbp ffffffff819fea39: e8 42 ff ff ff call ffffffff819fe980 <kfree_skb_reason.part.0> ffffffff819fea3e: 5d pop %rbp ffffffff819fea3f: c3 ret ffffffff819fea40: f0 ff 8f d4 00 00 00 lock decl 0xd4(%rdi) ffffffff819fea47: 0f 88 e5 44 27 00 js ffffffff81c72f32 <__noinstr_text_end+0x255d> ffffffff819fea4d: 74 e6 je ffffffff819fea35 <kfree_skb_reason+0x15> ffffffff819fea4f: c3 ret The compiler just splits the code after skb_unref() to another. After I warp the skb_unref() in unlinkly(), this function is not splitted any more. Yeah, I think we can make skb_unref() wrapped by unlikely() by the way. Thanks! Menglong Dong
Hi! On Fri, Aug 19, 2022 at 10:55:42PM +0800, Menglong Dong wrote: > Thanks for your explanation about the usage of 'noinline' and 'no_icf'! > I think 'noclone' seems enough in this case? As the function > 'kfree_skb_reason' we talk about is a global function, I think that the > compiler has no reason to make it inline, or be merged with another > function. Whether something is inlined is decided per instance (except for always_inline and noinline functions). Of course the function body has to be available for anything to be inlined, so barring LTO this can only happen for function uses in the same source file. Not very likely indeed, but not entirely impossible either. A function can be merged if there is another function that does exactly the same thing. This is unlikely with functions that do some serious work of course, but it is likely with stub-like functions. gl;hf, Segher
Hello, On Fri, Aug 19, 2022 at 11:24 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi! > > On Fri, Aug 19, 2022 at 10:55:42PM +0800, Menglong Dong wrote: > > Thanks for your explanation about the usage of 'noinline' and 'no_icf'! > > I think 'noclone' seems enough in this case? As the function > > 'kfree_skb_reason' we talk about is a global function, I think that the > > compiler has no reason to make it inline, or be merged with another > > function. > > Whether something is inlined is decided per instance (except for > always_inline and noinline functions). Of course the function body has > to be available for anything to be inlined, so barring LTO this can only > happen for function uses in the same source file. Not very likely > indeed, but not entirely impossible either. > I understand it now, the global function is indeed possible to be made inline by the compiler, and 'noinline' seems necessary here too. Maybe I can add a new compiler attribute like this: /* * Used by functions that use '__builtin_return_address'. These function * don't want to be splited or made inline, which can make * the '__builtin_return_address' got unexpected address. */ #define __fix_address noinline __noclone > A function can be merged if there is another function that does exactly > the same thing. This is unlikely with functions that do some serious > work of course, but it is likely with stub-like functions. > I understand how 'icf'(Identical Code Folding) works now. In the case we talk about, It seems fine even if the function is merged. The only effect of 'icf' is the change of function name, which doesn't affect the result of '__builtin_return_address(0)'. Thanks! Menglong Dong > gl;hf, > > > Segher
* Menglong Dong: > /* > * Used by functions that use '__builtin_return_address'. These function > * don't want to be splited or made inline, which can make > * the '__builtin_return_address' got unexpected address. > */ > #define __fix_address noinline __noclone You need something on the function *declaration* as well, to inhibit sibcalls. Thanks, Florian
Hello, On Mon, Aug 22, 2022 at 4:01 PM Florian Weimer <fweimer@redhat.com> wrote: > > * Menglong Dong: > > > /* > > * Used by functions that use '__builtin_return_address'. These function > > * don't want to be splited or made inline, which can make > > * the '__builtin_return_address' got unexpected address. > > */ > > #define __fix_address noinline __noclone > > You need something on the function *declaration* as well, to inhibit > sibcalls. > I did some research on the 'sibcalls' you mentioned above. Feel like It's a little similar to 'inline', and makes the callee use the same stack frame with the caller, which obviously will influence the result of '__builtin_return_address'. Hmm......but I'm not able to find any attribute to disable this optimization. Do you have any ideas? Thanks! Menglong Dong > Thanks, > Florian >
* Menglong Dong: > Hello, > > On Mon, Aug 22, 2022 at 4:01 PM Florian Weimer <fweimer@redhat.com> wrote: >> >> * Menglong Dong: >> >> > /* >> > * Used by functions that use '__builtin_return_address'. These function >> > * don't want to be splited or made inline, which can make >> > * the '__builtin_return_address' got unexpected address. >> > */ >> > #define __fix_address noinline __noclone >> >> You need something on the function *declaration* as well, to inhibit >> sibcalls. >> > > I did some research on the 'sibcalls' you mentioned above. Feel like > It's a little similar to 'inline', and makes the callee use the same stack > frame with the caller, which obviously will influence the result of > '__builtin_return_address'. > > Hmm......but I'm not able to find any attribute to disable this optimization. > Do you have any ideas? Unless something changed quite recently, GCC does not allow disabling the optimization with a simple attribute (which would have to apply to function pointers as well, not functions). asm ("") barriers that move out a call out of the tail position are supposed to prevent the optimization. Thanks, Florian
On Tue, Sep 06, 2022 at 02:37:47PM +0200, Florian Weimer wrote: > > On Mon, Aug 22, 2022 at 4:01 PM Florian Weimer <fweimer@redhat.com> wrote: > > I did some research on the 'sibcalls' you mentioned above. Feel like > > It's a little similar to 'inline', and makes the callee use the same stack > > frame with the caller, which obviously will influence the result of > > '__builtin_return_address'. Sibling calls are essentially calls that can be replaced by jumps (aka "tail call"), without needing a separate entry point to the callee. Different targets can have a slightly different implementation and definition of what exactly is a sibling call, but that's the gist. > > Hmm......but I'm not able to find any attribute to disable this optimization. > > Do you have any ideas? > > Unless something changed quite recently, GCC does not allow disabling > the optimization with a simple attribute (which would have to apply to > function pointers as well, not functions). It isn't specified what a sibling call exactly *is*, certainly not on C level (only in the generated machine code), and the details differs per target. > asm ("") barriers that move > out a call out of the tail position are supposed to prevent the > optimization. Not just "supposed": they work 100%. The asm has to stay after the function call by the fundamental rules of C (the function call having a sequence point, and the asm a side effect). void g(void); void f(void) { g(); // This can not be optimised to a jump... asm(""); // ... because it has to stay before this. } Segher
* Segher Boessenkool: > On Tue, Sep 06, 2022 at 02:37:47PM +0200, Florian Weimer wrote: >> > On Mon, Aug 22, 2022 at 4:01 PM Florian Weimer <fweimer@redhat.com> wrote: >> > I did some research on the 'sibcalls' you mentioned above. Feel like >> > It's a little similar to 'inline', and makes the callee use the same stack >> > frame with the caller, which obviously will influence the result of >> > '__builtin_return_address'. > > Sibling calls are essentially calls that can be replaced by jumps (aka > "tail call"), without needing a separate entry point to the callee. > > Different targets can have a slightly different implementation and > definition of what exactly is a sibling call, but that's the gist. > >> > Hmm......but I'm not able to find any attribute to disable this optimization. >> > Do you have any ideas? >> >> Unless something changed quite recently, GCC does not allow disabling >> the optimization with a simple attribute (which would have to apply to >> function pointers as well, not functions). > > It isn't specified what a sibling call exactly *is*, certainly not on C > level (only in the generated machine code), and the details differs per > target. Sure, but GCC already disables this optimization in a generic fashion for noreturn calls. It should be possible to do the same based another function attribute. Thanks, Florian
On Wed, Sep 07, 2022 at 08:59:26PM +0200, Florian Weimer wrote: > * Segher Boessenkool: > > > On Tue, Sep 06, 2022 at 02:37:47PM +0200, Florian Weimer wrote: > >> > On Mon, Aug 22, 2022 at 4:01 PM Florian Weimer <fweimer@redhat.com> wrote: > >> > I did some research on the 'sibcalls' you mentioned above. Feel like > >> > It's a little similar to 'inline', and makes the callee use the same stack > >> > frame with the caller, which obviously will influence the result of > >> > '__builtin_return_address'. > > > > Sibling calls are essentially calls that can be replaced by jumps (aka > > "tail call"), without needing a separate entry point to the callee. > > > > Different targets can have a slightly different implementation and > > definition of what exactly is a sibling call, but that's the gist. > > > >> > Hmm......but I'm not able to find any attribute to disable this optimization. > >> > Do you have any ideas? > >> > >> Unless something changed quite recently, GCC does not allow disabling > >> the optimization with a simple attribute (which would have to apply to > >> function pointers as well, not functions). > > > > It isn't specified what a sibling call exactly *is*, certainly not on C > > level (only in the generated machine code), and the details differs per > > target. > > Sure, but GCC already disables this optimization in a generic fashion > for noreturn calls. It should be possible to do the same based another > function attribute. My point is that disabling sibling calls does not necessarily do what you really want, certainly not on all targets. Many targets have their own frame optimisations and prologue/epilogue optimisations as well. But you can just do void f(void) __attribute__((optimize("no-optimize-sibling-calls"))); (in my previous example), and that works: it disables the (generic) sibling call optimisation. This may or may not do what you actually want though. Segher
diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index 445e80517cab..968cbafa2421 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -270,6 +270,25 @@ */ #define __noreturn __attribute__((__noreturn__)) +/* + * Optional: not supported by clang. + * Optional: not supported by icc. + * + * Prevent function from being splited to multiple part. As what the + * document says in gcc/ipa-split.cc, single function will be splited + * when necessary: + * + * https://github.com/gcc-mirror/gcc/blob/master/gcc/ipa-split.cc + * + * This optimization seems only take effect on O2 and O3 optimize level. + * Therefore, make the optimize level to O1 to prevent this optimization. + */ +#if __has_attribute(__optimize__) +# define __nofnsplit __attribute__((__optimize__("O1"))) +#else +# define __nofnsplit +#endif + /* * Optional: not supported by gcc. * Optional: not supported by icc. diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 974bbbbe7138..ff9ccbc032b9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -777,7 +777,8 @@ EXPORT_SYMBOL(__kfree_skb); * hit zero. Meanwhile, pass the drop reason to 'kfree_skb' * tracepoint. */ -void kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) +void __nofnsplit +kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) { if (!skb_unref(skb)) return;