diff mbox series

Fix initializing a static union variable

Message ID 20240620181736.1270455-1-yabinc@google.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Fix initializing a static union variable | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 842 this patch: 842
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 11 of 11 maintainers
netdev/build_clang success Errors and warnings before: 849 this patch: 849
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 950 this patch: 950
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-06-21--03-00 (tests: 659)

Commit Message

Yabin Cui June 20, 2024, 6:17 p.m. UTC
saddr_wildcard is a static union variable initialized with {}.
But c11 standard doesn't guarantee initializing all fields as
zero for this case. As in https://godbolt.org/z/rWvdv6aEx,
clang only initializes the first field as zero, but the bits
corresponding to other (larger) members are undefined.

Signed-off-by: Yabin Cui <yabinc@google.com>
---
 net/xfrm/xfrm_state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Nick Desaulniers June 20, 2024, 7:31 p.m. UTC | #1
On Thu, Jun 20, 2024 at 11:17 AM Yabin Cui <yabinc@google.com> wrote:
>
> saddr_wildcard is a static union variable initialized with {}.
> But c11 standard doesn't guarantee initializing all fields as
> zero for this case. As in https://godbolt.org/z/rWvdv6aEx,

Specifically, it sounds like C99+ is just the first member of the
union, which is dumb since that may not necessarily be the largest
variant.  Can you find the specific relevant wording from a pre-c23
spec?

> clang only initializes the first field as zero, but the bits
> corresponding to other (larger) members are undefined.

Oh, that sucks!

Reading through the internal report on this is fascinating!  Nice job
tracking down the issue!  It sounds like if we can aggressively inline
the users of this partially initialized value, then the UB from
control flow on the partially initialized value can result in
Android's kernel network tests failing.  It might be good to include
more info on "why this is a problem" in the commit message.

https://godbolt.org/z/hxnT1PTWo more clearly demonstrates the issue, IMO.

TIL that C23 clarifies this, but clang still doesn't have the correct
codegen then for -std=c23.  Can you please find or file a bug about
this, then add a link to it in the commit message?

It might be interesting to link to the specific section of n3096 that
clarifies this, or if there was a corresponding defect report sent to
ISO about this.  Maybe something from
https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm
discusses this?

Can you also please (find or) file a bug against clang about this? A
compiler diagnostic would be very very helpful here, since `= {};` is
such a common idiom.

Patch LGTM, but I think more context can be provided in the commit
message in a v2 that helps reviewers follow along with what's going on
here.

>
> Signed-off-by: Yabin Cui <yabinc@google.com>
> ---
>  net/xfrm/xfrm_state.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index 649bb739df0d..9bc69d703e5c 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
>                 struct xfrm_policy *pol, int *err,
>                 unsigned short family, u32 if_id)
>  {
> -       static xfrm_address_t saddr_wildcard = { };
> +       static const xfrm_address_t saddr_wildcard;
>         struct net *net = xp_net(pol);
>         unsigned int h, h_wildcard;
>         struct xfrm_state *x, *x0, *to_put;
> --
> 2.45.2.741.gdbec12cfda-goog
>
Nick Desaulniers June 20, 2024, 7:47 p.m. UTC | #2
On Thu, Jun 20, 2024 at 12:31 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Thu, Jun 20, 2024 at 11:17 AM Yabin Cui <yabinc@google.com> wrote:
> >
> > saddr_wildcard is a static union variable initialized with {}.
> > But c11 standard doesn't guarantee initializing all fields as
> > zero for this case. As in https://godbolt.org/z/rWvdv6aEx,
>
> Specifically, it sounds like C99+ is just the first member of the
> union, which is dumb since that may not necessarily be the largest
> variant.  Can you find the specific relevant wording from a pre-c23
> spec?
>
> > clang only initializes the first field as zero, but the bits
> > corresponding to other (larger) members are undefined.
>
> Oh, that sucks!
>
> Reading through the internal report on this is fascinating!  Nice job
> tracking down the issue!  It sounds like if we can aggressively inline
> the users of this partially initialized value, then the UB from
> control flow on the partially initialized value can result in
> Android's kernel network tests failing.  It might be good to include
> more info on "why this is a problem" in the commit message.
>
> https://godbolt.org/z/hxnT1PTWo more clearly demonstrates the issue, IMO.
>
> TIL that C23 clarifies this, but clang still doesn't have the correct
> codegen then for -std=c23.  Can you please find or file a bug about
> this, then add a link to it in the commit message?
>
> It might be interesting to link to the specific section of n3096 that
> clarifies this, or if there was a corresponding defect report sent to
> ISO about this.  Maybe something from
> https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm
> discusses this?

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3011.htm

https://clang.llvm.org/c_status.html mentions that n3011 was addressed
by clang-17, but based on my godbolt link above, it seems perhaps not?

6.7.10.2 of n3096 (c23) defines "empty initialization" (which wasn't
defined in older standards).

Ah, reading

n2310 (c17) 6.7.9.10:

```
If an object that has static or thread storage duration is not
initialized explicitly, then:
...
— if it is a union, the first named member is initialized
(recursively) according to these rules, and
any padding is initialized to zero bits;
```

Specifically, "the first named member" was a terrible mistake in the language.

Yikes! Might want to quote that in the commit message.

>
> Can you also please (find or) file a bug against clang about this? A
> compiler diagnostic would be very very helpful here, since `= {};` is
> such a common idiom.
>
> Patch LGTM, but I think more context can be provided in the commit
> message in a v2 that helps reviewers follow along with what's going on
> here.
>
> >
> > Signed-off-by: Yabin Cui <yabinc@google.com>
> > ---
> >  net/xfrm/xfrm_state.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> > index 649bb739df0d..9bc69d703e5c 100644
> > --- a/net/xfrm/xfrm_state.c
> > +++ b/net/xfrm/xfrm_state.c
> > @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
> >                 struct xfrm_policy *pol, int *err,
> >                 unsigned short family, u32 if_id)
> >  {
> > -       static xfrm_address_t saddr_wildcard = { };
> > +       static const xfrm_address_t saddr_wildcard;
> >         struct net *net = xp_net(pol);
> >         unsigned int h, h_wildcard;
> >         struct xfrm_state *x, *x0, *to_put;
> > --
> > 2.45.2.741.gdbec12cfda-goog
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers
Nick Desaulniers June 20, 2024, 7:54 p.m. UTC | #3
On Thu, Jun 20, 2024 at 12:47 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Thu, Jun 20, 2024 at 12:31 PM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > On Thu, Jun 20, 2024 at 11:17 AM Yabin Cui <yabinc@google.com> wrote:
> > >
> > > saddr_wildcard is a static union variable initialized with {}.
> > > But c11 standard doesn't guarantee initializing all fields as
> > > zero for this case. As in https://godbolt.org/z/rWvdv6aEx,
> >
> > Specifically, it sounds like C99+ is just the first member of the
> > union, which is dumb since that may not necessarily be the largest
> > variant.  Can you find the specific relevant wording from a pre-c23
> > spec?
> >
> > > clang only initializes the first field as zero, but the bits
> > > corresponding to other (larger) members are undefined.
> >
> > Oh, that sucks!
> >
> > Reading through the internal report on this is fascinating!  Nice job
> > tracking down the issue!  It sounds like if we can aggressively inline
> > the users of this partially initialized value, then the UB from
> > control flow on the partially initialized value can result in
> > Android's kernel network tests failing.  It might be good to include
> > more info on "why this is a problem" in the commit message.
> >
> > https://godbolt.org/z/hxnT1PTWo more clearly demonstrates the issue, IMO.
> >
> > TIL that C23 clarifies this, but clang still doesn't have the correct
> > codegen then for -std=c23.  Can you please find or file a bug about
> > this, then add a link to it in the commit message?
> >
> > It might be interesting to link to the specific section of n3096 that
> > clarifies this, or if there was a corresponding defect report sent to
> > ISO about this.  Maybe something from
> > https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm
> > discusses this?
>
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3011.htm
>
> https://clang.llvm.org/c_status.html mentions that n3011 was addressed
> by clang-17, but based on my godbolt link above, it seems perhaps not?

Sorry, n3011 was a minor revision to
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2900.htm
which is a better citation for this bug, IMO.  I still think the clang
status page is wrong (for n2900) and that is a bug against clang that
should be fixed (for c23), but this kernel patch still has merit
(since the issue I'm referring to in clang is not what's leading to
the test case failures).

>
> 6.7.10.2 of n3096 (c23) defines "empty initialization" (which wasn't
> defined in older standards).
>
> Ah, reading
>
> n2310 (c17) 6.7.9.10:
>
> ```
> If an object that has static or thread storage duration is not
> initialized explicitly, then:
> ...
> — if it is a union, the first named member is initialized
> (recursively) according to these rules, and
> any padding is initialized to zero bits;
> ```
>
> Specifically, "the first named member" was a terrible mistake in the language.
>
> Yikes! Might want to quote that in the commit message.
>
> >
> > Can you also please (find or) file a bug against clang about this? A
> > compiler diagnostic would be very very helpful here, since `= {};` is
> > such a common idiom.
> >
> > Patch LGTM, but I think more context can be provided in the commit
> > message in a v2 that helps reviewers follow along with what's going on
> > here.
> >
> > >
> > > Signed-off-by: Yabin Cui <yabinc@google.com>
> > > ---
> > >  net/xfrm/xfrm_state.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> > > index 649bb739df0d..9bc69d703e5c 100644
> > > --- a/net/xfrm/xfrm_state.c
> > > +++ b/net/xfrm/xfrm_state.c
> > > @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
> > >                 struct xfrm_policy *pol, int *err,
> > >                 unsigned short family, u32 if_id)
> > >  {
> > > -       static xfrm_address_t saddr_wildcard = { };
> > > +       static const xfrm_address_t saddr_wildcard;
> > >         struct net *net = xp_net(pol);
> > >         unsigned int h, h_wildcard;
> > >         struct xfrm_state *x, *x0, *to_put;
> > > --
> > > 2.45.2.741.gdbec12cfda-goog
> > >
> >
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
>
>
>
> --
> Thanks,
> ~Nick Desaulniers
Przemek Kitszel June 21, 2024, 11:01 a.m. UTC | #4
On 6/20/24 20:17, Yabin Cui wrote:
> saddr_wildcard is a static union variable initialized with {}.
> But c11 standard doesn't guarantee initializing all fields as
> zero for this case. As in https://godbolt.org/z/rWvdv6aEx,
> clang only initializes the first field as zero, but the bits
> corresponding to other (larger) members are undefined.

Nice finding!

You have to add a Fixes tag, perhaps:
Fixes: 08ec9af1c062 ("xfrm: Fix xfrm_state_find() wrt. wildcard source 
address.")

And the other thing is that I would change the order in the union, to
have largest element as the first. Would be best to also add such check
into static analysis tool/s.

> 
> Signed-off-by: Yabin Cui <yabinc@google.com>
> ---
>   net/xfrm/xfrm_state.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index 649bb739df0d..9bc69d703e5c 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
>   		struct xfrm_policy *pol, int *err,
>   		unsigned short family, u32 if_id)
>   {
> -	static xfrm_address_t saddr_wildcard = { };
> +	static const xfrm_address_t saddr_wildcard;
>   	struct net *net = xp_net(pol);
>   	unsigned int h, h_wildcard;
>   	struct xfrm_state *x, *x0, *to_put;
Herbert Xu June 21, 2024, 11:07 a.m. UTC | #5
On Thu, Jun 20, 2024 at 12:31:46PM -0700, Nick Desaulniers wrote:
>
> Can you also please (find or) file a bug against clang about this? A
> compiler diagnostic would be very very helpful here, since `= {};` is
> such a common idiom.

This idiom is used throughout the kernel.  If we decide that it
isn't safe to use then we should change the kernel as a whole rather
than the one spot that happens to have been identified.

Alternatively the buggy compiler should be banned until it's fixed.

Cheers,
Ballman, Aaron June 21, 2024, 11:24 a.m. UTC | #6
Sadly, this is not a bug in Clang but is the terrible way C still works in C23. ☹ See C23 6.7.11p11, the last bullet:

if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits.

So the padding gets initialized as does the first member, but if the first member is not the largest member in the union, that leaves other bits uninitialized. This happened late during NB comment stages, which is why N2900 and N3011 give the impression that the largest union member should be initialized.

That said, maybe there's appetite within the community to initialize the largest member as a conforming extension. The downside is that people not playing tricks with their unions may end up paying additional initialization cost that's useless, in pathological cases. e.g.,

int main() {
  union U {
    int i;
    struct {
      long double huge[1000];
    } s;
  } u = {}; // This is not a brilliant use of unions
  return u.i;
}

But how often does this matter for performance -- I have to imagine that most unions make use of most of the memory needed for their members instead of being lopsided like that. If we find some important use case that has significantly worse performance, we could always add a driver flag to control the behavior to let people opt into/out of the extension.

~Aaron

-----Original Message-----
From: Nick Desaulniers <ndesaulniers@google.com> 
Sent: Thursday, June 20, 2024 3:54 PM
To: Yabin Cui <yabinc@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>; Herbert Xu <herbert@gondor.apana.org.au>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Nathan Chancellor <nathan@kernel.org>; Bill Wendling <morbo@google.com>; Justin Stitt <justinstitt@google.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; llvm@lists.linux.dev; Ballman, Aaron <aaron.ballman@intel.com>
Subject: Re: [PATCH] Fix initializing a static union variable

On Thu, Jun 20, 2024 at 12:47 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Thu, Jun 20, 2024 at 12:31 PM Nick Desaulniers 
> <ndesaulniers@google.com> wrote:
> >
> > On Thu, Jun 20, 2024 at 11:17 AM Yabin Cui <yabinc@google.com> wrote:
> > >
> > > saddr_wildcard is a static union variable initialized with {}.
> > > But c11 standard doesn't guarantee initializing all fields as zero 
> > > for this case. As in https://godbolt.org/z/rWvdv6aEx,
> >
> > Specifically, it sounds like C99+ is just the first member of the 
> > union, which is dumb since that may not necessarily be the largest 
> > variant.  Can you find the specific relevant wording from a pre-c23 
> > spec?
> >
> > > clang only initializes the first field as zero, but the bits 
> > > corresponding to other (larger) members are undefined.
> >
> > Oh, that sucks!
> >
> > Reading through the internal report on this is fascinating!  Nice 
> > job tracking down the issue!  It sounds like if we can aggressively 
> > inline the users of this partially initialized value, then the UB 
> > from control flow on the partially initialized value can result in 
> > Android's kernel network tests failing.  It might be good to include 
> > more info on "why this is a problem" in the commit message.
> >
> > https://godbolt.org/z/hxnT1PTWo more clearly demonstrates the issue, IMO.
> >
> > TIL that C23 clarifies this, but clang still doesn't have the 
> > correct codegen then for -std=c23.  Can you please find or file a 
> > bug about this, then add a link to it in the commit message?
> >
> > It might be interesting to link to the specific section of n3096 
> > that clarifies this, or if there was a corresponding defect report 
> > sent to ISO about this.  Maybe something from 
> > https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm
> > discusses this?
>
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3011.htm
>
> https://clang.llvm.org/c_status.html mentions that n3011 was addressed 
> by clang-17, but based on my godbolt link above, it seems perhaps not?

Sorry, n3011 was a minor revision to
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2900.htm
which is a better citation for this bug, IMO.  I still think the clang status page is wrong (for n2900) and that is a bug against clang that should be fixed (for c23), but this kernel patch still has merit (since the issue I'm referring to in clang is not what's leading to the test case failures).

>
> 6.7.10.2 of n3096 (c23) defines "empty initialization" (which wasn't 
> defined in older standards).
>
> Ah, reading
>
> n2310 (c17) 6.7.9.10:
>
> ```
> If an object that has static or thread storage duration is not 
> initialized explicitly, then:
> ...
> — if it is a union, the first named member is initialized
> (recursively) according to these rules, and any padding is initialized 
> to zero bits; ```
>
> Specifically, "the first named member" was a terrible mistake in the language.
>
> Yikes! Might want to quote that in the commit message.
>
> >
> > Can you also please (find or) file a bug against clang about this? A 
> > compiler diagnostic would be very very helpful here, since `= {};` 
> > is such a common idiom.
> >
> > Patch LGTM, but I think more context can be provided in the commit 
> > message in a v2 that helps reviewers follow along with what's going 
> > on here.
> >
> > >
> > > Signed-off-by: Yabin Cui <yabinc@google.com>
> > > ---
> > >  net/xfrm/xfrm_state.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 
> > > 649bb739df0d..9bc69d703e5c 100644
> > > --- a/net/xfrm/xfrm_state.c
> > > +++ b/net/xfrm/xfrm_state.c
> > > @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
> > >                 struct xfrm_policy *pol, int *err,
> > >                 unsigned short family, u32 if_id)  {
> > > -       static xfrm_address_t saddr_wildcard = { };
> > > +       static const xfrm_address_t saddr_wildcard;
> > >         struct net *net = xp_net(pol);
> > >         unsigned int h, h_wildcard;
> > >         struct xfrm_state *x, *x0, *to_put;
> > > --
> > > 2.45.2.741.gdbec12cfda-goog
> > >
> >
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
>
>
>
> --
> Thanks,
> ~Nick Desaulniers



--
Thanks,
~Nick Desaulniers
Yabin Cui June 21, 2024, 8:49 p.m. UTC | #7
> You have to add a Fixes tag, perhaps:
> Fixes: 08ec9af1c062 ("xfrm: Fix xfrm_state_find() wrt. wildcard source
address.")

Thanks! I will add it in the v2 patch.

> And the other thing is that I would change the order in the union, to
> have largest element as the first. Would be best to also add such check
> into static analysis tool/s.

xfrm_address_t is defined in include/uapi. So I am a little hesitant
to change it now.
Currently clang doesn't even have a check/warning to detect this.
I am not sure how clang will fix it in the future, in
https://github.com/llvm/llvm-project/issues/78034.

> This idiom is used throughout the kernel.  If we decide that it
> isn't safe to use then we should change the kernel as a whole rather
> than the one spot that happens to have been identified.

> Alternatively the buggy compiler should be banned until it's fixed.

The problem probably happens in all clang versions, depending on the
optimization flags and inline attributes
used. I found this problem when trying to add
-fdebug-info-for-profiling flag when compiling the code.
This patch is to fix kernel code where the problem happens, before
clang can give a better
solution in https://github.com/llvm/llvm-project/issues/78034.


On Fri, Jun 21, 2024 at 4:25 AM Ballman, Aaron <aaron.ballman@intel.com> wrote:
>
> Sadly, this is not a bug in Clang but is the terrible way C still works in C23. ☹ See C23 6.7.11p11, the last bullet:
>
> if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits.
>
> So the padding gets initialized as does the first member, but if the first member is not the largest member in the union, that leaves other bits uninitialized. This happened late during NB comment stages, which is why N2900 and N3011 give the impression that the largest union member should be initialized.
>
> That said, maybe there's appetite within the community to initialize the largest member as a conforming extension. The downside is that people not playing tricks with their unions may end up paying additional initialization cost that's useless, in pathological cases. e.g.,
>
> int main() {
>   union U {
>     int i;
>     struct {
>       long double huge[1000];
>     } s;
>   } u = {}; // This is not a brilliant use of unions
>   return u.i;
> }
>
> But how often does this matter for performance -- I have to imagine that most unions make use of most of the memory needed for their members instead of being lopsided like that. If we find some important use case that has significantly worse performance, we could always add a driver flag to control the behavior to let people opt into/out of the extension.

I guess if people don't want the cost of zero initializing a union
variable, they will not use "= {}". And the situation that using "=
{}" only zero initializes the first member of a union variable isn't
very intuitive.

>
> ~Aaron
>
> -----Original Message-----
> From: Nick Desaulniers <ndesaulniers@google.com>
> Sent: Thursday, June 20, 2024 3:54 PM
> To: Yabin Cui <yabinc@google.com>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>; Herbert Xu <herbert@gondor.apana.org.au>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Nathan Chancellor <nathan@kernel.org>; Bill Wendling <morbo@google.com>; Justin Stitt <justinstitt@google.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; llvm@lists.linux.dev; Ballman, Aaron <aaron.ballman@intel.com>
> Subject: Re: [PATCH] Fix initializing a static union variable
>
> On Thu, Jun 20, 2024 at 12:47 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > On Thu, Jun 20, 2024 at 12:31 PM Nick Desaulniers
> > <ndesaulniers@google.com> wrote:
> > >
> > > On Thu, Jun 20, 2024 at 11:17 AM Yabin Cui <yabinc@google.com> wrote:
> > > >
> > > > saddr_wildcard is a static union variable initialized with {}.
> > > > But c11 standard doesn't guarantee initializing all fields as zero
> > > > for this case. As in https://godbolt.org/z/rWvdv6aEx,
> > >
> > > Specifically, it sounds like C99+ is just the first member of the
> > > union, which is dumb since that may not necessarily be the largest
> > > variant.  Can you find the specific relevant wording from a pre-c23
> > > spec?
> > >

I feel it's unspecified in pre-c23 spec.
From https://en.cppreference.com/w/c/language/struct_initialization,
When initializing an object of struct or union type, the initializer
must be a non-empty,(until C23).

> > > > clang only initializes the first field as zero, but the bits
> > > > corresponding to other (larger) members are undefined.
> > >
> > > Oh, that sucks!
> > >
> > > Reading through the internal report on this is fascinating!  Nice
> > > job tracking down the issue!  It sounds like if we can aggressively
> > > inline the users of this partially initialized value, then the UB
> > > from control flow on the partially initialized value can result in
> > > Android's kernel network tests failing.  It might be good to include
> > > more info on "why this is a problem" in the commit message.
> > >
> > > https://godbolt.org/z/hxnT1PTWo more clearly demonstrates the issue, IMO.

Great, I will use it in the v2 patch.

> > >
> > > TIL that C23 clarifies this, but clang still doesn't have the
> > > correct codegen then for -std=c23.  Can you please find or file a
> > > bug about this, then add a link to it in the commit message?
> > >
> > > It might be interesting to link to the specific section of n3096
> > > that clarifies this, or if there was a corresponding defect report
> > > sent to ISO about this.  Maybe something from
> > > https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm
> > > discusses this?
> >
> > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3011.htm
> >
> > https://clang.llvm.org/c_status.html mentions that n3011 was addressed
> > by clang-17, but based on my godbolt link above, it seems perhaps not?
>
> Sorry, n3011 was a minor revision to
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2900.htm
> which is a better citation for this bug, IMO.  I still think the clang status page is wrong (for n2900) and that is a bug against clang that should be fixed (for c23), but this kernel patch still has merit (since the issue I'm referring to in clang is not what's leading to the test case failures).
>

I totally agree with you. And if clang fixes the support, I guess it
will not be limited to c23.

> >
> > 6.7.10.2 of n3096 (c23) defines "empty initialization" (which wasn't
> > defined in older standards).
> >
> > Ah, reading
> >
> > n2310 (c17) 6.7.9.10:
> >
> > ```
> > If an object that has static or thread storage duration is not
> > initialized explicitly, then:
> > ...
> > — if it is a union, the first named member is initialized
> > (recursively) according to these rules, and any padding is initialized
> > to zero bits; ```
> >
> > Specifically, "the first named member" was a terrible mistake in the language.
> >
> > Yikes! Might want to quote that in the commit message.
> >
> > >
> > > Can you also please (find or) file a bug against clang about this? A
> > > compiler diagnostic would be very very helpful here, since `= {};`
> > > is such a common idiom.
> > >

Reported to clang upstream in
https://github.com/llvm/llvm-project/issues/78034#issuecomment-2183233517.
It's a problem that still happens in the c23 standard.


> > > Patch LGTM, but I think more context can be provided in the commit
> > > message in a v2 that helps reviewers follow along with what's going
> > > on here.
> > >
> > > >
> > > > Signed-off-by: Yabin Cui <yabinc@google.com>
> > > > ---
> > > >  net/xfrm/xfrm_state.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index
> > > > 649bb739df0d..9bc69d703e5c 100644
> > > > --- a/net/xfrm/xfrm_state.c
> > > > +++ b/net/xfrm/xfrm_state.c
> > > > @@ -1139,7 +1139,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
> > > >                 struct xfrm_policy *pol, int *err,
> > > >                 unsigned short family, u32 if_id)  {
> > > > -       static xfrm_address_t saddr_wildcard = { };
> > > > +       static const xfrm_address_t saddr_wildcard;
> > > >         struct net *net = xp_net(pol);
> > > >         unsigned int h, h_wildcard;
> > > >         struct xfrm_state *x, *x0, *to_put;
> > > > --
> > > > 2.45.2.741.gdbec12cfda-goog
> > > >
> > >
> > >
> > > --
> > > Thanks,
> > > ~Nick Desaulniers
> >
> >
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
>
>
>
> --
> Thanks,
> ~Nick Desaulniers
Linus Torvalds June 23, 2024, 4:51 p.m. UTC | #8
On Fri, 21 Jun 2024 at 07:07, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Thu, Jun 20, 2024 at 12:31:46PM -0700, Nick Desaulniers wrote:
> >
> > Can you also please (find or) file a bug against clang about this? A
> > compiler diagnostic would be very very helpful here, since `= {};` is
> > such a common idiom.
>
> This idiom is used throughout the kernel.  If we decide that it
> isn't safe to use then we should change the kernel as a whole rather
> than the one spot that happens to have been identified.

Again - the commit message and the whole discussion about the C23
standard is actively misleading, as shown byu this whole thread.

The bug IS NOT ABOUT THE EMPTY INITIALIZER.

The exact same problem happens with "{ 0 }" as happens with "{ }".

The bug is literally that some versions of clang seem to implement
BBOTH of these as "initialize the first member of the union", which
then means that if the first member isn't the largest member, the rest
of the union is entirely undefined.

And by "undefined" I don't mean in the "C standard sense". It may be
that *too* in some versions of the C standards, but that's not really
the issue any more.

In the kernel, we do expect initializers that always initialize the
whole variable fully.

We have actively moved away from doing manual variable initialization
because they've generated both worse code and sometimes triggered
other problems. See for example 390001d648ff ("drm/i915/mtl: avoid
stringop-overflow warning"), but also things like 75dc03453122 ("xfs:
fix xfs_btree_query_range callers to initialize btree rec fully") or
e3a69496a1cd ("media: Prefer designated initializers over memset for
subdev pad ops")

In other words, in the kernel we simply depend on initializers working
reliably and fully. Partly because we've literally been told by
compiler people that that is what we should do.

So no, this is not about empty initializers. And this is not about
some C standard verbiage.

This is literally about "the linux kernel expects initializers to
FULLY initialize variables". Padding, other union members, you name
it.

If clang doesn't do that, then clang is buggy as far as the kernel is
concerned, and no amount of standards reading is relevant.

And in particular, no amount of "but empty initializer" is relevant.

I *hope* this is some unreleased clang version that has this bug.
Because at least the clang version I have access to (clang 17.0.6)
does not seem to exhibit this issue.

              Linus
Linus Torvalds June 23, 2024, 5:05 p.m. UTC | #9
On Sun, 23 Jun 2024 at 12:51, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I *hope* this is some unreleased clang version that has this bug.
> Because at least the clang version I have access to (clang 17.0.6)
> does not seem to exhibit this issue.

Hmm. Strange. godbolt says that it happens with clang 17.0.1 (and
earlier) with a plain -O2.

It just doesn't happen for me. Either this got fixed already and my
17.0.6 has the fix, or there's some subtle flag that my test-case uses
(some config file - I just use "-O2" for local testing like the
godbolt page did).

But clearly godbolt does  think this happens with released clang versions too.

            Linus
Yabin Cui June 24, 2024, 6:04 p.m. UTC | #10
> In other words, in the kernel we simply depend on initializers working
> reliably and fully. Partly because we've literally been told by
> compiler people that that is what we should do.
>
> So no, this is not about empty initializers. And this is not about
> some C standard verbiage.
>
> This is literally about "the linux kernel expects initializers to
> FULLY initialize variables". Padding, other union members, you name
> it.
>
> If clang doesn't do that, then clang is buggy as far as the kernel is
> concerned, and no amount of standards reading is relevant.
>
> And in particular, no amount of "but empty initializer" is relevant.
>

Thanks for the detailed explanation!
Sorry for limiting the problem to the empty initializer. I didn't
realize the linux
kernel also depends on zero initializing extra bytes when explicitly
initializing
one field of a union type.

> And when the union is embedded in a struct, the struct initialization
> seems to be ok from a quick test, but I might have screwed that test
> up.

I also think so. But probably we need to add tests in clang to make sure
it continues to work.

> Hmm. Strange. godbolt says that it happens with clang 17.0.1 (and
> earlier) with a plain -O2.
>
> It just doesn't happen for me. Either this got fixed already and my
> 17.0.6 has the fix, or there's some subtle flag that my test-case uses
> (some config file - I just use "-O2" for local testing like the
> godbolt page did).
>
> But clearly godbolt does  think this happens with released clang versions too.
>

Yes, I also think it happens in both clang trunk and past releases, as tested in
https://godbolt.org/z/vnGqKK43z.
Gladly in the clang bug in
https://github.com/llvm/llvm-project/issues/78034, no one
is against zero initializing the whole union type.
I feel now it's no longer important to have this patch in the kernel.
But we need to
fix this problem in clang and backport the fix to releases we care about.

Thanks,
Yabin
diff mbox series

Patch

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 649bb739df0d..9bc69d703e5c 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1139,7 +1139,7 @@  xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 		struct xfrm_policy *pol, int *err,
 		unsigned short family, u32 if_id)
 {
-	static xfrm_address_t saddr_wildcard = { };
+	static const xfrm_address_t saddr_wildcard;
 	struct net *net = xp_net(pol);
 	unsigned int h, h_wildcard;
 	struct xfrm_state *x, *x0, *to_put;