Message ID | 20210727205855.411487-5-keescook@chromium.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | Introduce strict memcpy() bounds checking | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
On Tue, Jul 27, 2021 at 01:57:55PM -0700, Kees Cook wrote: > Kernel code has a regular need to describe groups of members within a > structure usually when they need to be copied or initialized separately > from the rest of the surrounding structure. The generally accepted design > pattern in C is to use a named sub-struct: > > struct foo { > int one; > struct { > int two; > int three; > } thing; > int four; > }; > > This would allow for traditional references and sizing: > > memcpy(&dst.thing, &src.thing, sizeof(dst.thing)); > > However, doing this would mean that referencing struct members enclosed > by such named structs would always require including the sub-struct name > in identifiers: > > do_something(dst.thing.three); > > This has tended to be quite inflexible, especially when such groupings > need to be added to established code which causes huge naming churn. > Three workarounds exist in the kernel for this problem, and each have > other negative properties. > > To avoid the naming churn, there is a design pattern of adding macro > aliases for the named struct: > > #define f_three thing.three > > This ends up polluting the global namespace, and makes it difficult to > search for identifiers. > > Another common work-around in kernel code avoids the pollution by avoiding > the named struct entirely, instead identifying the group's boundaries using > either a pair of empty anonymous structs of a pair of zero-element arrays: > > struct foo { > int one; > struct { } start; > int two; > int three; > struct { } finish; > int four; > }; > > struct foo { > int one; > int start[0]; > int two; > int three; > int finish[0]; > int four; > }; > > This allows code to avoid needing to use a sub-struct name for member > references within the surrounding structure, but loses the benefits of > being able to actually use such a struct, making it rather fragile. Using > these requires open-coded calculation of sizes and offsets. The efforts > made to avoid common mistakes include lots of comments, or adding various > BUILD_BUG_ON()s. Such code is left with no way for the compiler to reason > about the boundaries (e.g. the "start" object looks like it's 0 bytes > in length and is not structurally associated with "finish"), making bounds > checking depend on open-coded calculations: > > if (length > offsetof(struct foo, finish) - > offsetof(struct foo, start)) > return -EINVAL; > memcpy(&dst.start, &src.start, length); > > However, the vast majority of places in the kernel that operate on > groups of members do so without any identification of the grouping, > relying either on comments or implicit knowledge of the struct contents, > which is even harder for the compiler to reason about, and results in > even more fragile manual sizing, usually depending on member locations > outside of the region (e.g. to copy "two" and "three", use the start of > "four" to find the size): > > BUILD_BUG_ON((offsetof(struct foo, four) < > offsetof(struct foo, two)) || > (offsetof(struct foo, four) < > offsetof(struct foo, three)); > if (length > offsetof(struct foo, four) - > offsetof(struct foo, two)) > return -EINVAL; > memcpy(&dst.two, &src.two, length); > > And both of the prior two idioms additionally appear to write beyond the > end of the referenced struct member, forcing the compiler to ignore any > attempt to perform bounds checking. > > In order to have a regular programmatic way to describe a struct > region that can be used for references and sizing, can be examined for > bounds checking, avoids forcing the use of intermediate identifiers, > and avoids polluting the global namespace, introduce the struct_group() > macro. This macro wraps the member declarations to create an anonymous > union of an anonymous struct (no intermediate name) and a named struct > (for references and sizing): > > struct foo { > int one; > struct_group(thing, > int two, > int three, > ); > int four; > }; > > if (length > sizeof(src.thing)) > return -EINVAL; > memcpy(&dst.thing, &src.thing, length); > do_something(dst.three); > > There are some rare cases where the resulting struct_group() needs > attributes added, so struct_group_attr() is also introduced to allow > for specifying struct attributes (e.g. __align(x) or __packed). > > Co-developed-by: Keith Packard <keithpac@amazon.com> > Signed-off-by: Keith Packard <keithpac@amazon.com> > Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Love it! :) Thanks -- Gustavo > --- > include/linux/stddef.h | 34 ++++++++++++++++++++++++++++++++++ > 1 file changed, 34 insertions(+) > > diff --git a/include/linux/stddef.h b/include/linux/stddef.h > index 998a4ba28eba..cf7f866944f9 100644 > --- a/include/linux/stddef.h > +++ b/include/linux/stddef.h > @@ -36,4 +36,38 @@ enum { > #define offsetofend(TYPE, MEMBER) \ > (offsetof(TYPE, MEMBER) + sizeof_field(TYPE, MEMBER)) > > +/** > + * struct_group_attr(NAME, ATTRS, MEMBERS) > + * > + * Used to create an anonymous union of two structs with identical > + * layout and size: one anonymous and one named. The former can be > + * used normally without sub-struct naming, and the latter can be > + * used to reason about the start, end, and size of the group of > + * struct members. Includes structure attributes argument. > + * > + * @NAME: The name of the mirrored sub-struct > + * @ATTRS: Any struct attributes (normally empty) > + * @MEMBERS: The member declarations for the mirrored structs > + */ > +#define struct_group_attr(NAME, ATTRS, MEMBERS) \ > + union { \ > + struct { MEMBERS } ATTRS; \ > + struct { MEMBERS } ATTRS NAME; \ > + } > + > +/** > + * struct_group(NAME, MEMBERS) > + * > + * Used to create an anonymous union of two structs with identical > + * layout and size: one anonymous and one named. The former can be > + * used normally without sub-struct naming, and the latter can be > + * used to reason about the start, end, and size of the group of > + * struct members. > + * > + * @NAME: The name of the mirrored sub-struct > + * @MEMBERS: The member declarations for the mirrored structs > + */ > +#define struct_group(NAME, MEMBERS) \ > + struct_group_attr(NAME, /* no attrs */, MEMBERS) > + > #endif > -- > 2.30.2 >
On 27/07/2021 22.57, Kees Cook wrote: > In order to have a regular programmatic way to describe a struct > region that can be used for references and sizing, can be examined for > bounds checking, avoids forcing the use of intermediate identifiers, > and avoids polluting the global namespace, introduce the struct_group() > macro. This macro wraps the member declarations to create an anonymous > union of an anonymous struct (no intermediate name) and a named struct > (for references and sizing): > > struct foo { > int one; > struct_group(thing, > int two, > int three, > ); > int four; > }; That example won't compile, the commas after two and three should be semicolons. And your implementation relies on MEMBERS not containing any comma tokens, but as int a, b, c, d; is a valid way to declare multiple members, consider making MEMBERS variadic #define struct_group(NAME, MEMBERS...) to have it slurp up every subsequent argument and make that work. > > Co-developed-by: Keith Packard <keithpac@amazon.com> > Signed-off-by: Keith Packard <keithpac@amazon.com> > Signed-off-by: Kees Cook <keescook@chromium.org> > --- > include/linux/stddef.h | 34 ++++++++++++++++++++++++++++++++++ Bikeshedding a bit, but do we need to add 34 lines that need to be preprocessed to virtually each and every translation unit [as opposed to adding a struct_group.h header]? Oh well, you need it for struct skbuff.h, so it would be pulled in by a lot regardless :( Rasmus
On Wed, Jul 28, 2021 at 12:54:18PM +0200, Rasmus Villemoes wrote: > On 27/07/2021 22.57, Kees Cook wrote: > > > In order to have a regular programmatic way to describe a struct > > region that can be used for references and sizing, can be examined for > > bounds checking, avoids forcing the use of intermediate identifiers, > > and avoids polluting the global namespace, introduce the struct_group() > > macro. This macro wraps the member declarations to create an anonymous > > union of an anonymous struct (no intermediate name) and a named struct > > (for references and sizing): > > > > struct foo { > > int one; > > struct_group(thing, > > int two, > > int three, > > ); > > int four; > > }; > > That example won't compile, the commas after two and three should be > semicolons. Oops, yes, thanks. This is why I shouldn't write code that doesn't first go through a compiler. ;) > And your implementation relies on MEMBERS not containing any comma > tokens, but as > > int a, b, c, d; > > is a valid way to declare multiple members, consider making MEMBERS > variadic > > #define struct_group(NAME, MEMBERS...) > > to have it slurp up every subsequent argument and make that work. Ah! Perfect, thank you. I totally forgot I could do it that way. > > > > > Co-developed-by: Keith Packard <keithpac@amazon.com> > > Signed-off-by: Keith Packard <keithpac@amazon.com> > > Signed-off-by: Kees Cook <keescook@chromium.org> > > --- > > include/linux/stddef.h | 34 ++++++++++++++++++++++++++++++++++ > > Bikeshedding a bit, but do we need to add 34 lines that need to be > preprocessed to virtually each and every translation unit [as opposed to > adding a struct_group.h header]? Oh well, you need it for struct > skbuff.h, so it would be pulled in by a lot regardless :( My instinct is to make these kinds of helpers "always available" (like sizeof_field(), etc), but I have no strong opinion on where it should live. If the consensus is to move it, I certainly can! :) -Kees
On Wed, 2021-07-28 at 14:59 -0700, Kees Cook wrote: > On Wed, Jul 28, 2021 at 12:54:18PM +0200, Rasmus Villemoes wrote: > > On 27/07/2021 22.57, Kees Cook wrote: > > > > > In order to have a regular programmatic way to describe a struct > > > region that can be used for references and sizing, can be examined for > > > bounds checking, avoids forcing the use of intermediate identifiers, > > > and avoids polluting the global namespace, introduce the struct_group() > > > macro. This macro wraps the member declarations to create an anonymous > > > union of an anonymous struct (no intermediate name) and a named struct > > > (for references and sizing): > > > > > > struct foo { > > > int one; > > > struct_group(thing, > > > int two, > > > int three, > > > ); > > > int four; > > > }; > > > > That example won't compile, the commas after two and three should be > > semicolons. > > Oops, yes, thanks. This is why I shouldn't write code that doesn't first > go through a compiler. ;) > > > And your implementation relies on MEMBERS not containing any comma > > tokens, but as > > > > int a, b, c, d; > > > > is a valid way to declare multiple members, consider making MEMBERS > > variadic > > > > #define struct_group(NAME, MEMBERS...) > > > > to have it slurp up every subsequent argument and make that work. > > Ah! Perfect, thank you. I totally forgot I could do it that way. This is great Kees. It just so happens it would clean-up what we are already doing in drivers/cxl/cxl.h for anonymous + named register block pointers. However in the cxl case it also needs the named structure to be typed. Any appetite for a typed version of this? Here is a rough idea of the cleanup it would induce in drivers/cxl/: diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 53927f9fa77e..a2308c995654 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -75,52 +75,19 @@ static inline int cxl_hdm_decoder_count(u32 cap_hdr) #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18 #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20 -#define CXL_COMPONENT_REGS() \ - void __iomem *hdm_decoder - -#define CXL_DEVICE_REGS() \ - void __iomem *status; \ - void __iomem *mbox; \ - void __iomem *memdev - -/* See note for 'struct cxl_regs' for the rationale of this organization */ /* - * CXL_COMPONENT_REGS - Common set of CXL Component register block base pointers * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure - */ -struct cxl_component_regs { - CXL_COMPONENT_REGS(); -}; - -/* See note for 'struct cxl_regs' for the rationale of this organization */ -/* - * CXL_DEVICE_REGS - Common set of CXL Device register block base pointers * @status: CXL 2.0 8.2.8.3 Device Status Registers * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers */ -struct cxl_device_regs { - CXL_DEVICE_REGS(); -}; - -/* - * Note, the anonymous union organization allows for per - * register-block-type helper routines, without requiring block-type - * agnostic code to include the prefix. - */ struct cxl_regs { - union { - struct { - CXL_COMPONENT_REGS(); - }; - struct cxl_component_regs component; - }; - union { - struct { - CXL_DEVICE_REGS(); - }; - struct cxl_device_regs device_regs; - }; + struct_group_typed(cxl_component_regs, component, + void __iomem *hdm_decoder; + ); + struct_group_typed(cxl_device_regs, device_regs, + void __iomem *status, *mbox, *memdev; + ); }; struct cxl_reg_map { diff --git a/include/linux/stddef.h b/include/linux/stddef.h index cf7f866944f9..84b7de24ffb5 100644 --- a/include/linux/stddef.h +++ b/include/linux/stddef.h @@ -49,12 +49,18 @@ enum { * @ATTRS: Any struct attributes (normally empty) * @MEMBERS: The member declarations for the mirrored structs */ -#define struct_group_attr(NAME, ATTRS, MEMBERS) \ +#define struct_group_attr(NAME, ATTRS, MEMBERS...) \ union { \ struct { MEMBERS } ATTRS; \ struct { MEMBERS } ATTRS NAME; \ } +#define struct_group_attr_typed(TYPE, NAME, ATTRS, MEMBERS...) \ + union { \ + struct { MEMBERS } ATTRS; \ + struct TYPE { MEMBERS } ATTRS NAME; \ + } + /** * struct_group(NAME, MEMBERS) * @@ -67,7 +73,10 @@ enum { * @NAME: The name of the mirrored sub-struct * @MEMBERS: The member declarations for the mirrored structs */ -#define struct_group(NAME, MEMBERS) \ +#define struct_group(NAME, MEMBERS...) \ struct_group_attr(NAME, /* no attrs */, MEMBERS) +#define struct_group_typed(TYPE, NAME, MEMBERS...) \ + struct_group_attr_typed(TYPE, NAME, /* no attrs */, MEMBERS) + #endif
On Fri, Jul 30, 2021 at 10:19:20PM +0000, Williams, Dan J wrote: > On Wed, 2021-07-28 at 14:59 -0700, Kees Cook wrote: > > On Wed, Jul 28, 2021 at 12:54:18PM +0200, Rasmus Villemoes wrote: > > > On 27/07/2021 22.57, Kees Cook wrote: > > > > > > > In order to have a regular programmatic way to describe a struct > > > > region that can be used for references and sizing, can be examined for > > > > bounds checking, avoids forcing the use of intermediate identifiers, > > > > and avoids polluting the global namespace, introduce the struct_group() > > > > macro. This macro wraps the member declarations to create an anonymous > > > > union of an anonymous struct (no intermediate name) and a named struct > > > > (for references and sizing): > > > > > > > > struct foo { > > > > int one; > > > > struct_group(thing, > > > > int two, > > > > int three, > > > > ); > > > > int four; > > > > }; > > > > > > That example won't compile, the commas after two and three should be > > > semicolons. > > > > Oops, yes, thanks. This is why I shouldn't write code that doesn't first > > go through a compiler. ;) > > > > > And your implementation relies on MEMBERS not containing any comma > > > tokens, but as > > > > > > int a, b, c, d; > > > > > > is a valid way to declare multiple members, consider making MEMBERS > > > variadic > > > > > > #define struct_group(NAME, MEMBERS...) > > > > > > to have it slurp up every subsequent argument and make that work. > > > > Ah! Perfect, thank you. I totally forgot I could do it that way. > > This is great Kees. It just so happens it would clean-up what we are > already doing in drivers/cxl/cxl.h for anonymous + named register block > pointers. However in the cxl case it also needs the named structure to > be typed. Any appetite for a typed version of this? Oh cool! Yeah, totally I can expand it. Thanks for the suggestion! > > Here is a rough idea of the cleanup it would induce in drivers/cxl/: > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index 53927f9fa77e..a2308c995654 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -75,52 +75,19 @@ static inline int cxl_hdm_decoder_count(u32 cap_hdr) > #define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18 > #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20 > > -#define CXL_COMPONENT_REGS() \ > - void __iomem *hdm_decoder > - > -#define CXL_DEVICE_REGS() \ > - void __iomem *status; \ > - void __iomem *mbox; \ > - void __iomem *memdev > - > -/* See note for 'struct cxl_regs' for the rationale of this organization */ > /* > - * CXL_COMPONENT_REGS - Common set of CXL Component register block base pointers > * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure > - */ > -struct cxl_component_regs { > - CXL_COMPONENT_REGS(); > -}; > - > -/* See note for 'struct cxl_regs' for the rationale of this organization */ > -/* > - * CXL_DEVICE_REGS - Common set of CXL Device register block base pointers > * @status: CXL 2.0 8.2.8.3 Device Status Registers > * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers > * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers > */ > -struct cxl_device_regs { > - CXL_DEVICE_REGS(); > -}; > - > -/* > - * Note, the anonymous union organization allows for per > - * register-block-type helper routines, without requiring block-type > - * agnostic code to include the prefix. > - */ > struct cxl_regs { > - union { > - struct { > - CXL_COMPONENT_REGS(); > - }; > - struct cxl_component_regs component; > - }; > - union { > - struct { > - CXL_DEVICE_REGS(); > - }; > - struct cxl_device_regs device_regs; > - }; > + struct_group_typed(cxl_component_regs, component, > + void __iomem *hdm_decoder; > + ); > + struct_group_typed(cxl_device_regs, device_regs, > + void __iomem *status, *mbox, *memdev; > + ); > }; > > struct cxl_reg_map { > diff --git a/include/linux/stddef.h b/include/linux/stddef.h > index cf7f866944f9..84b7de24ffb5 100644 > --- a/include/linux/stddef.h > +++ b/include/linux/stddef.h > @@ -49,12 +49,18 @@ enum { > * @ATTRS: Any struct attributes (normally empty) > * @MEMBERS: The member declarations for the mirrored structs > */ > -#define struct_group_attr(NAME, ATTRS, MEMBERS) \ > +#define struct_group_attr(NAME, ATTRS, MEMBERS...) \ > union { \ > struct { MEMBERS } ATTRS; \ > struct { MEMBERS } ATTRS NAME; \ > } > > +#define struct_group_attr_typed(TYPE, NAME, ATTRS, MEMBERS...) \ > + union { \ > + struct { MEMBERS } ATTRS; \ > + struct TYPE { MEMBERS } ATTRS NAME; \ > + } > + > /** > * struct_group(NAME, MEMBERS) > * > @@ -67,7 +73,10 @@ enum { > * @NAME: The name of the mirrored sub-struct > * @MEMBERS: The member declarations for the mirrored structs > */ > -#define struct_group(NAME, MEMBERS) \ > +#define struct_group(NAME, MEMBERS...) \ > struct_group_attr(NAME, /* no attrs */, MEMBERS) > > +#define struct_group_typed(TYPE, NAME, MEMBERS...) \ > + struct_group_attr_typed(TYPE, NAME, /* no attrs */, MEMBERS) > + > #endif Awesome! My instinct is to expose the resulting API as: __struct_group(type, name, attrs, members...) struct_group(name, members...) struct_group_attr(name, attrs, members...) struct_group_typed(type, name, members...)
On Sat, Jul 31, 2021 at 07:24:44AM +0200, Rasmus Villemoes wrote: > On Sat, Jul 31, 2021, 04:59 Kees Cook <keescook@chromium.org> wrote: > > > On Fri, Jul 30, 2021 at 10:19:20PM +0000, Williams, Dan J wrote: > > > On Wed, 2021-07-28 at 14:59 -0700, Kees Cook wrote: > > > > > /** > > > * struct_group(NAME, MEMBERS) > > > * > > > @@ -67,7 +73,10 @@ enum { > > > * @NAME: The name of the mirrored sub-struct > > > * @MEMBERS: The member declarations for the mirrored structs > > > */ > > > -#define struct_group(NAME, MEMBERS) \ > > > +#define struct_group(NAME, MEMBERS...) \ > > > struct_group_attr(NAME, /* no attrs */, MEMBERS) > > > > > > +#define struct_group_typed(TYPE, NAME, MEMBERS...) \ > > > + struct_group_attr_typed(TYPE, NAME, /* no attrs */, MEMBERS) > > > + > > > #endif > > > > Awesome! My instinct is to expose the resulting API as: > > > > __struct_group(type, name, attrs, members...) > > > > struct_group(name, members...) > > struct_group_attr(name, attrs, members...) > > struct_group_typed(type, name, members...) > > Bikeshed: can we use proper nomenclature please. s/type/tag/, > s/typed/tagged. Ah! Thank you. I went looking for the spec on what these are called and couldn't find it. "struct $tag" is the type, then, yes? So IIUC now: | type | members | name | tag struct foo { int bar; } baz;
diff --git a/include/linux/stddef.h b/include/linux/stddef.h index 998a4ba28eba..cf7f866944f9 100644 --- a/include/linux/stddef.h +++ b/include/linux/stddef.h @@ -36,4 +36,38 @@ enum { #define offsetofend(TYPE, MEMBER) \ (offsetof(TYPE, MEMBER) + sizeof_field(TYPE, MEMBER)) +/** + * struct_group_attr(NAME, ATTRS, MEMBERS) + * + * Used to create an anonymous union of two structs with identical + * layout and size: one anonymous and one named. The former can be + * used normally without sub-struct naming, and the latter can be + * used to reason about the start, end, and size of the group of + * struct members. Includes structure attributes argument. + * + * @NAME: The name of the mirrored sub-struct + * @ATTRS: Any struct attributes (normally empty) + * @MEMBERS: The member declarations for the mirrored structs + */ +#define struct_group_attr(NAME, ATTRS, MEMBERS) \ + union { \ + struct { MEMBERS } ATTRS; \ + struct { MEMBERS } ATTRS NAME; \ + } + +/** + * struct_group(NAME, MEMBERS) + * + * Used to create an anonymous union of two structs with identical + * layout and size: one anonymous and one named. The former can be + * used normally without sub-struct naming, and the latter can be + * used to reason about the start, end, and size of the group of + * struct members. + * + * @NAME: The name of the mirrored sub-struct + * @MEMBERS: The member declarations for the mirrored structs + */ +#define struct_group(NAME, MEMBERS) \ + struct_group_attr(NAME, /* no attrs */, MEMBERS) + #endif