Message ID | 20240620181736.1270455-1-yabinc@google.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Fix initializing a static union variable | expand |
On Thu, Jun 20, 2024 at 11:17 AM Yabin Cui <yabinc@google.com> wrote: > > saddr_wildcard is a static union variable initialized with {}. > But c11 standard doesn't guarantee initializing all fields as > zero for this case. As in https://godbolt.org/z/rWvdv6aEx, Specifically, it sounds like C99+ is just the first member of the union, which is dumb since that may not necessarily be the largest variant. Can you find the specific relevant wording from a pre-c23 spec? > clang only initializes the first field as zero, but the bits > corresponding to other (larger) members are undefined. Oh, that sucks! Reading through the internal report on this is fascinating! Nice job tracking down the issue! It sounds like if we can aggressively inline the users of this partially initialized value, then the UB from control flow on the partially initialized value can result in Android's kernel network tests failing. It might be good to include more info on "why this is a problem" in the commit message. https://godbolt.org/z/hxnT1PTWo more clearly demonstrates the issue, IMO. TIL that C23 clarifies this, but clang still doesn't have the correct codegen then for -std=c23. Can you please find or file a bug about this, then add a link to it in the commit message? It might be interesting to link to the specific section of n3096 that clarifies this, or if there was a corresponding defect report sent to ISO about this. Maybe something from https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm discusses this? Can you also please (find or) file a bug against clang about this? A compiler diagnostic would be very very helpful here, since `= {};` is such a common idiom. Patch LGTM, but I think more context can be provided in the commit message in a v2 that helps reviewers follow along with what's going on here. > > Signed-off-by: Yabin Cui <yabinc@google.com> > --- > net/xfrm/xfrm_state.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > index 649bb739df0d..9bc69d703e5c 100644 > --- a/net/xfrm/xfrm_state.c > +++ b/net/xfrm/xfrm_state.c > @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > struct xfrm_policy *pol, int *err, > unsigned short family, u32 if_id) > { > - static xfrm_address_t saddr_wildcard = { }; > + static const xfrm_address_t saddr_wildcard; > struct net *net = xp_net(pol); > unsigned int h, h_wildcard; > struct xfrm_state *x, *x0, *to_put; > -- > 2.45.2.741.gdbec12cfda-goog >
On Thu, Jun 20, 2024 at 12:31 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Thu, Jun 20, 2024 at 11:17 AM Yabin Cui <yabinc@google.com> wrote: > > > > saddr_wildcard is a static union variable initialized with {}. > > But c11 standard doesn't guarantee initializing all fields as > > zero for this case. As in https://godbolt.org/z/rWvdv6aEx, > > Specifically, it sounds like C99+ is just the first member of the > union, which is dumb since that may not necessarily be the largest > variant. Can you find the specific relevant wording from a pre-c23 > spec? > > > clang only initializes the first field as zero, but the bits > > corresponding to other (larger) members are undefined. > > Oh, that sucks! > > Reading through the internal report on this is fascinating! Nice job > tracking down the issue! It sounds like if we can aggressively inline > the users of this partially initialized value, then the UB from > control flow on the partially initialized value can result in > Android's kernel network tests failing. It might be good to include > more info on "why this is a problem" in the commit message. > > https://godbolt.org/z/hxnT1PTWo more clearly demonstrates the issue, IMO. > > TIL that C23 clarifies this, but clang still doesn't have the correct > codegen then for -std=c23. Can you please find or file a bug about > this, then add a link to it in the commit message? > > It might be interesting to link to the specific section of n3096 that > clarifies this, or if there was a corresponding defect report sent to > ISO about this. Maybe something from > https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm > discusses this? https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3011.htm https://clang.llvm.org/c_status.html mentions that n3011 was addressed by clang-17, but based on my godbolt link above, it seems perhaps not? 6.7.10.2 of n3096 (c23) defines "empty initialization" (which wasn't defined in older standards). Ah, reading n2310 (c17) 6.7.9.10: ``` If an object that has static or thread storage duration is not initialized explicitly, then: ... — if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits; ``` Specifically, "the first named member" was a terrible mistake in the language. Yikes! Might want to quote that in the commit message. > > Can you also please (find or) file a bug against clang about this? A > compiler diagnostic would be very very helpful here, since `= {};` is > such a common idiom. > > Patch LGTM, but I think more context can be provided in the commit > message in a v2 that helps reviewers follow along with what's going on > here. > > > > > Signed-off-by: Yabin Cui <yabinc@google.com> > > --- > > net/xfrm/xfrm_state.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > > index 649bb739df0d..9bc69d703e5c 100644 > > --- a/net/xfrm/xfrm_state.c > > +++ b/net/xfrm/xfrm_state.c > > @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > struct xfrm_policy *pol, int *err, > > unsigned short family, u32 if_id) > > { > > - static xfrm_address_t saddr_wildcard = { }; > > + static const xfrm_address_t saddr_wildcard; > > struct net *net = xp_net(pol); > > unsigned int h, h_wildcard; > > struct xfrm_state *x, *x0, *to_put; > > -- > > 2.45.2.741.gdbec12cfda-goog > > > > > -- > Thanks, > ~Nick Desaulniers
On Thu, Jun 20, 2024 at 12:47 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Thu, Jun 20, 2024 at 12:31 PM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > > > On Thu, Jun 20, 2024 at 11:17 AM Yabin Cui <yabinc@google.com> wrote: > > > > > > saddr_wildcard is a static union variable initialized with {}. > > > But c11 standard doesn't guarantee initializing all fields as > > > zero for this case. As in https://godbolt.org/z/rWvdv6aEx, > > > > Specifically, it sounds like C99+ is just the first member of the > > union, which is dumb since that may not necessarily be the largest > > variant. Can you find the specific relevant wording from a pre-c23 > > spec? > > > > > clang only initializes the first field as zero, but the bits > > > corresponding to other (larger) members are undefined. > > > > Oh, that sucks! > > > > Reading through the internal report on this is fascinating! Nice job > > tracking down the issue! It sounds like if we can aggressively inline > > the users of this partially initialized value, then the UB from > > control flow on the partially initialized value can result in > > Android's kernel network tests failing. It might be good to include > > more info on "why this is a problem" in the commit message. > > > > https://godbolt.org/z/hxnT1PTWo more clearly demonstrates the issue, IMO. > > > > TIL that C23 clarifies this, but clang still doesn't have the correct > > codegen then for -std=c23. Can you please find or file a bug about > > this, then add a link to it in the commit message? > > > > It might be interesting to link to the specific section of n3096 that > > clarifies this, or if there was a corresponding defect report sent to > > ISO about this. Maybe something from > > https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm > > discusses this? > > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3011.htm > > https://clang.llvm.org/c_status.html mentions that n3011 was addressed > by clang-17, but based on my godbolt link above, it seems perhaps not? Sorry, n3011 was a minor revision to https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2900.htm which is a better citation for this bug, IMO. I still think the clang status page is wrong (for n2900) and that is a bug against clang that should be fixed (for c23), but this kernel patch still has merit (since the issue I'm referring to in clang is not what's leading to the test case failures). > > 6.7.10.2 of n3096 (c23) defines "empty initialization" (which wasn't > defined in older standards). > > Ah, reading > > n2310 (c17) 6.7.9.10: > > ``` > If an object that has static or thread storage duration is not > initialized explicitly, then: > ... > — if it is a union, the first named member is initialized > (recursively) according to these rules, and > any padding is initialized to zero bits; > ``` > > Specifically, "the first named member" was a terrible mistake in the language. > > Yikes! Might want to quote that in the commit message. > > > > > Can you also please (find or) file a bug against clang about this? A > > compiler diagnostic would be very very helpful here, since `= {};` is > > such a common idiom. > > > > Patch LGTM, but I think more context can be provided in the commit > > message in a v2 that helps reviewers follow along with what's going on > > here. > > > > > > > > Signed-off-by: Yabin Cui <yabinc@google.com> > > > --- > > > net/xfrm/xfrm_state.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > > > index 649bb739df0d..9bc69d703e5c 100644 > > > --- a/net/xfrm/xfrm_state.c > > > +++ b/net/xfrm/xfrm_state.c > > > @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > > struct xfrm_policy *pol, int *err, > > > unsigned short family, u32 if_id) > > > { > > > - static xfrm_address_t saddr_wildcard = { }; > > > + static const xfrm_address_t saddr_wildcard; > > > struct net *net = xp_net(pol); > > > unsigned int h, h_wildcard; > > > struct xfrm_state *x, *x0, *to_put; > > > -- > > > 2.45.2.741.gdbec12cfda-goog > > > > > > > > > -- > > Thanks, > > ~Nick Desaulniers > > > > -- > Thanks, > ~Nick Desaulniers
On 6/20/24 20:17, Yabin Cui wrote: > saddr_wildcard is a static union variable initialized with {}. > But c11 standard doesn't guarantee initializing all fields as > zero for this case. As in https://godbolt.org/z/rWvdv6aEx, > clang only initializes the first field as zero, but the bits > corresponding to other (larger) members are undefined. Nice finding! You have to add a Fixes tag, perhaps: Fixes: 08ec9af1c062 ("xfrm: Fix xfrm_state_find() wrt. wildcard source address.") And the other thing is that I would change the order in the union, to have largest element as the first. Would be best to also add such check into static analysis tool/s. > > Signed-off-by: Yabin Cui <yabinc@google.com> > --- > net/xfrm/xfrm_state.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > index 649bb739df0d..9bc69d703e5c 100644 > --- a/net/xfrm/xfrm_state.c > +++ b/net/xfrm/xfrm_state.c > @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > struct xfrm_policy *pol, int *err, > unsigned short family, u32 if_id) > { > - static xfrm_address_t saddr_wildcard = { }; > + static const xfrm_address_t saddr_wildcard; > struct net *net = xp_net(pol); > unsigned int h, h_wildcard; > struct xfrm_state *x, *x0, *to_put;
On Thu, Jun 20, 2024 at 12:31:46PM -0700, Nick Desaulniers wrote: > > Can you also please (find or) file a bug against clang about this? A > compiler diagnostic would be very very helpful here, since `= {};` is > such a common idiom. This idiom is used throughout the kernel. If we decide that it isn't safe to use then we should change the kernel as a whole rather than the one spot that happens to have been identified. Alternatively the buggy compiler should be banned until it's fixed. Cheers,
Sadly, this is not a bug in Clang but is the terrible way C still works in C23. ☹ See C23 6.7.11p11, the last bullet: if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits. So the padding gets initialized as does the first member, but if the first member is not the largest member in the union, that leaves other bits uninitialized. This happened late during NB comment stages, which is why N2900 and N3011 give the impression that the largest union member should be initialized. That said, maybe there's appetite within the community to initialize the largest member as a conforming extension. The downside is that people not playing tricks with their unions may end up paying additional initialization cost that's useless, in pathological cases. e.g., int main() { union U { int i; struct { long double huge[1000]; } s; } u = {}; // This is not a brilliant use of unions return u.i; } But how often does this matter for performance -- I have to imagine that most unions make use of most of the memory needed for their members instead of being lopsided like that. If we find some important use case that has significantly worse performance, we could always add a driver flag to control the behavior to let people opt into/out of the extension. ~Aaron -----Original Message----- From: Nick Desaulniers <ndesaulniers@google.com> Sent: Thursday, June 20, 2024 3:54 PM To: Yabin Cui <yabinc@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com>; Herbert Xu <herbert@gondor.apana.org.au>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Nathan Chancellor <nathan@kernel.org>; Bill Wendling <morbo@google.com>; Justin Stitt <justinstitt@google.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; llvm@lists.linux.dev; Ballman, Aaron <aaron.ballman@intel.com> Subject: Re: [PATCH] Fix initializing a static union variable On Thu, Jun 20, 2024 at 12:47 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Thu, Jun 20, 2024 at 12:31 PM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > > > On Thu, Jun 20, 2024 at 11:17 AM Yabin Cui <yabinc@google.com> wrote: > > > > > > saddr_wildcard is a static union variable initialized with {}. > > > But c11 standard doesn't guarantee initializing all fields as zero > > > for this case. As in https://godbolt.org/z/rWvdv6aEx, > > > > Specifically, it sounds like C99+ is just the first member of the > > union, which is dumb since that may not necessarily be the largest > > variant. Can you find the specific relevant wording from a pre-c23 > > spec? > > > > > clang only initializes the first field as zero, but the bits > > > corresponding to other (larger) members are undefined. > > > > Oh, that sucks! > > > > Reading through the internal report on this is fascinating! Nice > > job tracking down the issue! It sounds like if we can aggressively > > inline the users of this partially initialized value, then the UB > > from control flow on the partially initialized value can result in > > Android's kernel network tests failing. It might be good to include > > more info on "why this is a problem" in the commit message. > > > > https://godbolt.org/z/hxnT1PTWo more clearly demonstrates the issue, IMO. > > > > TIL that C23 clarifies this, but clang still doesn't have the > > correct codegen then for -std=c23. Can you please find or file a > > bug about this, then add a link to it in the commit message? > > > > It might be interesting to link to the specific section of n3096 > > that clarifies this, or if there was a corresponding defect report > > sent to ISO about this. Maybe something from > > https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm > > discusses this? > > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3011.htm > > https://clang.llvm.org/c_status.html mentions that n3011 was addressed > by clang-17, but based on my godbolt link above, it seems perhaps not? Sorry, n3011 was a minor revision to https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2900.htm which is a better citation for this bug, IMO. I still think the clang status page is wrong (for n2900) and that is a bug against clang that should be fixed (for c23), but this kernel patch still has merit (since the issue I'm referring to in clang is not what's leading to the test case failures). > > 6.7.10.2 of n3096 (c23) defines "empty initialization" (which wasn't > defined in older standards). > > Ah, reading > > n2310 (c17) 6.7.9.10: > > ``` > If an object that has static or thread storage duration is not > initialized explicitly, then: > ... > — if it is a union, the first named member is initialized > (recursively) according to these rules, and any padding is initialized > to zero bits; ``` > > Specifically, "the first named member" was a terrible mistake in the language. > > Yikes! Might want to quote that in the commit message. > > > > > Can you also please (find or) file a bug against clang about this? A > > compiler diagnostic would be very very helpful here, since `= {};` > > is such a common idiom. > > > > Patch LGTM, but I think more context can be provided in the commit > > message in a v2 that helps reviewers follow along with what's going > > on here. > > > > > > > > Signed-off-by: Yabin Cui <yabinc@google.com> > > > --- > > > net/xfrm/xfrm_state.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index > > > 649bb739df0d..9bc69d703e5c 100644 > > > --- a/net/xfrm/xfrm_state.c > > > +++ b/net/xfrm/xfrm_state.c > > > @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > > struct xfrm_policy *pol, int *err, > > > unsigned short family, u32 if_id) { > > > - static xfrm_address_t saddr_wildcard = { }; > > > + static const xfrm_address_t saddr_wildcard; > > > struct net *net = xp_net(pol); > > > unsigned int h, h_wildcard; > > > struct xfrm_state *x, *x0, *to_put; > > > -- > > > 2.45.2.741.gdbec12cfda-goog > > > > > > > > > -- > > Thanks, > > ~Nick Desaulniers > > > > -- > Thanks, > ~Nick Desaulniers -- Thanks, ~Nick Desaulniers
> You have to add a Fixes tag, perhaps: > Fixes: 08ec9af1c062 ("xfrm: Fix xfrm_state_find() wrt. wildcard source address.") Thanks! I will add it in the v2 patch. > And the other thing is that I would change the order in the union, to > have largest element as the first. Would be best to also add such check > into static analysis tool/s. xfrm_address_t is defined in include/uapi. So I am a little hesitant to change it now. Currently clang doesn't even have a check/warning to detect this. I am not sure how clang will fix it in the future, in https://github.com/llvm/llvm-project/issues/78034. > This idiom is used throughout the kernel. If we decide that it > isn't safe to use then we should change the kernel as a whole rather > than the one spot that happens to have been identified. > Alternatively the buggy compiler should be banned until it's fixed. The problem probably happens in all clang versions, depending on the optimization flags and inline attributes used. I found this problem when trying to add -fdebug-info-for-profiling flag when compiling the code. This patch is to fix kernel code where the problem happens, before clang can give a better solution in https://github.com/llvm/llvm-project/issues/78034. On Fri, Jun 21, 2024 at 4:25 AM Ballman, Aaron <aaron.ballman@intel.com> wrote: > > Sadly, this is not a bug in Clang but is the terrible way C still works in C23. ☹ See C23 6.7.11p11, the last bullet: > > if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits. > > So the padding gets initialized as does the first member, but if the first member is not the largest member in the union, that leaves other bits uninitialized. This happened late during NB comment stages, which is why N2900 and N3011 give the impression that the largest union member should be initialized. > > That said, maybe there's appetite within the community to initialize the largest member as a conforming extension. The downside is that people not playing tricks with their unions may end up paying additional initialization cost that's useless, in pathological cases. e.g., > > int main() { > union U { > int i; > struct { > long double huge[1000]; > } s; > } u = {}; // This is not a brilliant use of unions > return u.i; > } > > But how often does this matter for performance -- I have to imagine that most unions make use of most of the memory needed for their members instead of being lopsided like that. If we find some important use case that has significantly worse performance, we could always add a driver flag to control the behavior to let people opt into/out of the extension. I guess if people don't want the cost of zero initializing a union variable, they will not use "= {}". And the situation that using "= {}" only zero initializes the first member of a union variable isn't very intuitive. > > ~Aaron > > -----Original Message----- > From: Nick Desaulniers <ndesaulniers@google.com> > Sent: Thursday, June 20, 2024 3:54 PM > To: Yabin Cui <yabinc@google.com> > Cc: Steffen Klassert <steffen.klassert@secunet.com>; Herbert Xu <herbert@gondor.apana.org.au>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Nathan Chancellor <nathan@kernel.org>; Bill Wendling <morbo@google.com>; Justin Stitt <justinstitt@google.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; llvm@lists.linux.dev; Ballman, Aaron <aaron.ballman@intel.com> > Subject: Re: [PATCH] Fix initializing a static union variable > > On Thu, Jun 20, 2024 at 12:47 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > On Thu, Jun 20, 2024 at 12:31 PM Nick Desaulniers > > <ndesaulniers@google.com> wrote: > > > > > > On Thu, Jun 20, 2024 at 11:17 AM Yabin Cui <yabinc@google.com> wrote: > > > > > > > > saddr_wildcard is a static union variable initialized with {}. > > > > But c11 standard doesn't guarantee initializing all fields as zero > > > > for this case. As in https://godbolt.org/z/rWvdv6aEx, > > > > > > Specifically, it sounds like C99+ is just the first member of the > > > union, which is dumb since that may not necessarily be the largest > > > variant. Can you find the specific relevant wording from a pre-c23 > > > spec? > > > I feel it's unspecified in pre-c23 spec. From https://en.cppreference.com/w/c/language/struct_initialization, When initializing an object of struct or union type, the initializer must be a non-empty,(until C23). > > > > clang only initializes the first field as zero, but the bits > > > > corresponding to other (larger) members are undefined. > > > > > > Oh, that sucks! > > > > > > Reading through the internal report on this is fascinating! Nice > > > job tracking down the issue! It sounds like if we can aggressively > > > inline the users of this partially initialized value, then the UB > > > from control flow on the partially initialized value can result in > > > Android's kernel network tests failing. It might be good to include > > > more info on "why this is a problem" in the commit message. > > > > > > https://godbolt.org/z/hxnT1PTWo more clearly demonstrates the issue, IMO. Great, I will use it in the v2 patch. > > > > > > TIL that C23 clarifies this, but clang still doesn't have the > > > correct codegen then for -std=c23. Can you please find or file a > > > bug about this, then add a link to it in the commit message? > > > > > > It might be interesting to link to the specific section of n3096 > > > that clarifies this, or if there was a corresponding defect report > > > sent to ISO about this. Maybe something from > > > https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm > > > discusses this? > > > > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3011.htm > > > > https://clang.llvm.org/c_status.html mentions that n3011 was addressed > > by clang-17, but based on my godbolt link above, it seems perhaps not? > > Sorry, n3011 was a minor revision to > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2900.htm > which is a better citation for this bug, IMO. I still think the clang status page is wrong (for n2900) and that is a bug against clang that should be fixed (for c23), but this kernel patch still has merit (since the issue I'm referring to in clang is not what's leading to the test case failures). > I totally agree with you. And if clang fixes the support, I guess it will not be limited to c23. > > > > 6.7.10.2 of n3096 (c23) defines "empty initialization" (which wasn't > > defined in older standards). > > > > Ah, reading > > > > n2310 (c17) 6.7.9.10: > > > > ``` > > If an object that has static or thread storage duration is not > > initialized explicitly, then: > > ... > > — if it is a union, the first named member is initialized > > (recursively) according to these rules, and any padding is initialized > > to zero bits; ``` > > > > Specifically, "the first named member" was a terrible mistake in the language. > > > > Yikes! Might want to quote that in the commit message. > > > > > > > > Can you also please (find or) file a bug against clang about this? A > > > compiler diagnostic would be very very helpful here, since `= {};` > > > is such a common idiom. > > > Reported to clang upstream in https://github.com/llvm/llvm-project/issues/78034#issuecomment-2183233517. It's a problem that still happens in the c23 standard. > > > Patch LGTM, but I think more context can be provided in the commit > > > message in a v2 that helps reviewers follow along with what's going > > > on here. > > > > > > > > > > > Signed-off-by: Yabin Cui <yabinc@google.com> > > > > --- > > > > net/xfrm/xfrm_state.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index > > > > 649bb739df0d..9bc69d703e5c 100644 > > > > --- a/net/xfrm/xfrm_state.c > > > > +++ b/net/xfrm/xfrm_state.c > > > > @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > > > struct xfrm_policy *pol, int *err, > > > > unsigned short family, u32 if_id) { > > > > - static xfrm_address_t saddr_wildcard = { }; > > > > + static const xfrm_address_t saddr_wildcard; > > > > struct net *net = xp_net(pol); > > > > unsigned int h, h_wildcard; > > > > struct xfrm_state *x, *x0, *to_put; > > > > -- > > > > 2.45.2.741.gdbec12cfda-goog > > > > > > > > > > > > > -- > > > Thanks, > > > ~Nick Desaulniers > > > > > > > > -- > > Thanks, > > ~Nick Desaulniers > > > > -- > Thanks, > ~Nick Desaulniers
On Fri, 21 Jun 2024 at 07:07, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Thu, Jun 20, 2024 at 12:31:46PM -0700, Nick Desaulniers wrote: > > > > Can you also please (find or) file a bug against clang about this? A > > compiler diagnostic would be very very helpful here, since `= {};` is > > such a common idiom. > > This idiom is used throughout the kernel. If we decide that it > isn't safe to use then we should change the kernel as a whole rather > than the one spot that happens to have been identified. Again - the commit message and the whole discussion about the C23 standard is actively misleading, as shown byu this whole thread. The bug IS NOT ABOUT THE EMPTY INITIALIZER. The exact same problem happens with "{ 0 }" as happens with "{ }". The bug is literally that some versions of clang seem to implement BBOTH of these as "initialize the first member of the union", which then means that if the first member isn't the largest member, the rest of the union is entirely undefined. And by "undefined" I don't mean in the "C standard sense". It may be that *too* in some versions of the C standards, but that's not really the issue any more. In the kernel, we do expect initializers that always initialize the whole variable fully. We have actively moved away from doing manual variable initialization because they've generated both worse code and sometimes triggered other problems. See for example 390001d648ff ("drm/i915/mtl: avoid stringop-overflow warning"), but also things like 75dc03453122 ("xfs: fix xfs_btree_query_range callers to initialize btree rec fully") or e3a69496a1cd ("media: Prefer designated initializers over memset for subdev pad ops") In other words, in the kernel we simply depend on initializers working reliably and fully. Partly because we've literally been told by compiler people that that is what we should do. So no, this is not about empty initializers. And this is not about some C standard verbiage. This is literally about "the linux kernel expects initializers to FULLY initialize variables". Padding, other union members, you name it. If clang doesn't do that, then clang is buggy as far as the kernel is concerned, and no amount of standards reading is relevant. And in particular, no amount of "but empty initializer" is relevant. I *hope* this is some unreleased clang version that has this bug. Because at least the clang version I have access to (clang 17.0.6) does not seem to exhibit this issue. Linus
On Sun, 23 Jun 2024 at 12:51, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I *hope* this is some unreleased clang version that has this bug. > Because at least the clang version I have access to (clang 17.0.6) > does not seem to exhibit this issue. Hmm. Strange. godbolt says that it happens with clang 17.0.1 (and earlier) with a plain -O2. It just doesn't happen for me. Either this got fixed already and my 17.0.6 has the fix, or there's some subtle flag that my test-case uses (some config file - I just use "-O2" for local testing like the godbolt page did). But clearly godbolt does think this happens with released clang versions too. Linus
> In other words, in the kernel we simply depend on initializers working > reliably and fully. Partly because we've literally been told by > compiler people that that is what we should do. > > So no, this is not about empty initializers. And this is not about > some C standard verbiage. > > This is literally about "the linux kernel expects initializers to > FULLY initialize variables". Padding, other union members, you name > it. > > If clang doesn't do that, then clang is buggy as far as the kernel is > concerned, and no amount of standards reading is relevant. > > And in particular, no amount of "but empty initializer" is relevant. > Thanks for the detailed explanation! Sorry for limiting the problem to the empty initializer. I didn't realize the linux kernel also depends on zero initializing extra bytes when explicitly initializing one field of a union type. > And when the union is embedded in a struct, the struct initialization > seems to be ok from a quick test, but I might have screwed that test > up. I also think so. But probably we need to add tests in clang to make sure it continues to work. > Hmm. Strange. godbolt says that it happens with clang 17.0.1 (and > earlier) with a plain -O2. > > It just doesn't happen for me. Either this got fixed already and my > 17.0.6 has the fix, or there's some subtle flag that my test-case uses > (some config file - I just use "-O2" for local testing like the > godbolt page did). > > But clearly godbolt does think this happens with released clang versions too. > Yes, I also think it happens in both clang trunk and past releases, as tested in https://godbolt.org/z/vnGqKK43z. Gladly in the clang bug in https://github.com/llvm/llvm-project/issues/78034, no one is against zero initializing the whole union type. I feel now it's no longer important to have this patch in the kernel. But we need to fix this problem in clang and backport the fix to releases we care about. Thanks, Yabin
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 649bb739df0d..9bc69d703e5c 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, struct xfrm_policy *pol, int *err, unsigned short family, u32 if_id) { - static xfrm_address_t saddr_wildcard = { }; + static const xfrm_address_t saddr_wildcard; struct net *net = xp_net(pol); unsigned int h, h_wildcard; struct xfrm_state *x, *x0, *to_put;
saddr_wildcard is a static union variable initialized with {}. But c11 standard doesn't guarantee initializing all fields as zero for this case. As in https://godbolt.org/z/rWvdv6aEx, clang only initializes the first field as zero, but the bits corresponding to other (larger) members are undefined. Signed-off-by: Yabin Cui <yabinc@google.com> --- net/xfrm/xfrm_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)