Message ID | 20180309144613.GA48965@beast (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 9 Mar 2018, Kees Cook wrote: > Avoid VLAs[1] by always allocating the upper bound of stack space > needed. The existing users of rslib appear to max out at 32 roots, > so use that as the upper bound. I think 32 is plenty. Do we have actually a user with 32? > Alternative: make init_rs() a true caller-instance and pre-allocate > the workspaces. Will this need locking or are the callers already > single-threaded in their use of librs? init_rs() is an init function which needs to be invoked _before_ the decoder/encoder can be used. The way it works today that it can share the rs_control between users to avoid duplicating the polynom arrays and the setup of them. So we might change how rs_control works and allocate rs_control for each invocation of init_rs(). That means we need two data structures: Rename rs_control to rs_poly and just use that internaly for sharing the polynom arrays. rs_control then becomes: struct rs_control { struct rs_poly *poly; uint16_t lamda[MAX_ROOTS + 1]; .... uint16_t loc[MAX_ROOTS]; }; But as you said that requires serialization or separation at the usage sites. drivers/mtd/nand/* would either need a mutex or allocate one rs_control per instance. Simple enough to do. drivers/md/dm-verity-fec.c looks like it's allocating a dm control struct for each worker thread, so that should just require allocating one rs_control per worker then. pstore only has an issue in case of OOPS. A simple solution would be to allocate two rs_control structs, one for regular usage and one for the OOPS case. Not sure if that covers all possible problems, so that needs more thoughts. Thanks, tglx
On Fri, Mar 9, 2018 at 7:49 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > On Fri, 9 Mar 2018, Kees Cook wrote: > >> Avoid VLAs[1] by always allocating the upper bound of stack space >> needed. The existing users of rslib appear to max out at 32 roots, >> so use that as the upper bound. > > I think 32 is plenty. Do we have actually a user with 32? I found 24 as the max, but thought maybe 32 would be better? drivers/md/dm-verity-fec.h:#define DM_VERITY_FEC_RSM 255 drivers/md/dm-verity-fec.h:#define DM_VERITY_FEC_MAX_RSN 253 drivers/md/dm-verity-fec.h:#define DM_VERITY_FEC_MIN_RSN 231 /* ~10% space overhead */ drivers/md/dm-verity-fec.c: if (sscanf(arg_value, "%hhu%c", &num_c, &dummy) != 1 || !num_c || num_c < (DM_VERITY_FEC_RSM - DM_VERITY_FEC_MAX_RSN) || num_c > (DM_VERITY_FEC_RSM - DM_VERITY_FEC_MIN_RSN)) { ti->error = "Invalid " DM_VERITY_OPT_FEC_ROOTS; return -EINVAL; } v->fec->roots = num_c; ... drivers/md/dm-verity-fec.c: return init_rs(8, 0x11d, 0, 1, v->fec->roots); So this can be as much as 24. drivers/mtd/nand/diskonchip.c:#define NROOTS 4 drivers/mtd/nand/diskonchip.c: rs_decoder = init_rs(10, 0x409, FCR, 1, NROOTS); 4. fs/pstore/ram.c:static int ramoops_ecc; fs/pstore/ram.c:module_param_named(ecc, ramoops_ecc, int, 0600); fs/pstore/ram.c:MODULE_PARM_DESC(ramoops_ecc, fs/pstore/ram.c: dummy_data->ecc_info.ecc_size = ramoops_ecc == 1 ? 16 : ramoops_ecc; ... fs/pstore/ram.c: cxt->ecc_info = pdata->ecc_info; ... fs/pstore/ram_core.c: prz->rs_decoder = init_rs(prz->ecc_info.symsize, prz->ecc_info.poly, fs/pstore/ram_core.c- 0, 1, prz->ecc_info.ecc_size); The default "ecc enabled" mode for pstore is 16, but was made dynamic a while ago. However, I've only ever seen people use a smaller number of roots. >> Alternative: make init_rs() a true caller-instance and pre-allocate >> the workspaces. Will this need locking or are the callers already >> single-threaded in their use of librs? > > init_rs() is an init function which needs to be invoked _before_ the > decoder/encoder can be used. > > The way it works today that it can share the rs_control between users to > avoid duplicating the polynom arrays and the setup of them. > > So we might change how rs_control works and allocate rs_control for each > invocation of init_rs(). That means we need two data structures: > > Rename rs_control to rs_poly and just use that internaly for sharing the > polynom arrays. > > rs_control then becomes: > > struct rs_control { > struct rs_poly *poly; > uint16_t lamda[MAX_ROOTS + 1]; > .... > uint16_t loc[MAX_ROOTS]; > }; > > But as you said that requires serialization or separation at the usage > sites. Right. Not my favorite idea. :P > drivers/mtd/nand/* would either need a mutex or allocate one rs_control per > instance. Simple enough to do. > > drivers/md/dm-verity-fec.c looks like it's allocating a dm control struct > for each worker thread, so that should just require allocating one > rs_control per worker then. > > pstore only has an issue in case of OOPS. A simple solution would be to > allocate two rs_control structs, one for regular usage and one for the OOPS > case. Not sure if that covers all possible problems, so that needs more > thoughts. Maybe I should just go with 24 as the max, and if we have a case where we need more, address it then? -Kees
On Fri, 9 Mar 2018, Kees Cook wrote: > On Fri, Mar 9, 2018 at 7:49 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Fri, 9 Mar 2018, Kees Cook wrote: > > Maybe I should just go with 24 as the max, and if we have a case where > we need more, address it then? Works for me. Thanks, tglx
diff --git a/lib/reed_solomon/decode_rs.c b/lib/reed_solomon/decode_rs.c index 0ec3f257ffdf..3e3becb836a6 100644 --- a/lib/reed_solomon/decode_rs.c +++ b/lib/reed_solomon/decode_rs.c @@ -31,9 +31,10 @@ * of nroots is 8. So the necessary stack size will be about * 220 bytes max. */ - uint16_t lambda[nroots + 1], syn[nroots]; - uint16_t b[nroots + 1], t[nroots + 1], omega[nroots + 1]; - uint16_t root[nroots], reg[nroots + 1], loc[nroots]; + uint16_t lambda[RS_MAX_ROOTS + 1], syn[RS_MAX_ROOTS]; + uint16_t b[RS_MAX_ROOTS + 1], t[RS_MAX_ROOTS + 1]; + uint16_t omega[RS_MAX_ROOTS + 1], root[RS_MAX_ROOTS]; + uint16_t reg[RS_MAX_ROOTS + 1], loc[RS_MAX_ROOTS]; int count = 0; uint16_t msk = (uint16_t) rs->nn; diff --git a/lib/reed_solomon/reed_solomon.c b/lib/reed_solomon/reed_solomon.c index 06d04cfa9339..1ad9094ddf66 100644 --- a/lib/reed_solomon/reed_solomon.c +++ b/lib/reed_solomon/reed_solomon.c @@ -51,6 +51,9 @@ static LIST_HEAD (rslist); /* Protection for the list */ static DEFINE_MUTEX(rslistlock); +/* Ultimately controls the upper bounds of the on-stack buffers. */ +#define RS_MAX_ROOTS 32 + /** * rs_init - Initialize a Reed-Solomon codec * @symsize: symbol size, bits (1-8) @@ -210,7 +213,7 @@ static struct rs_control *init_rs_internal(int symsize, int gfpoly, return NULL; if (prim <= 0 || prim >= (1<<symsize)) return NULL; - if (nroots < 0 || nroots >= (1<<symsize)) + if (nroots < 0 || nroots >= (1<<symsize) || nroots > RS_MAX_ROOTS) return NULL; mutex_lock(&rslistlock);
Avoid VLAs[1] by always allocating the upper bound of stack space needed. The existing users of rslib appear to max out at 32 roots, so use that as the upper bound. Alternative: make init_rs() a true caller-instance and pre-allocate the workspaces. Will this need locking or are the callers already single-threaded in their use of librs? Using kmalloc in this path doesn't look great, especially since at least one caller (pstore) is sensitive to allocations during rslib usage (it expects to run it during an Oops, for example). [1] https://lkml.org/lkml/2018/3/7/621 Signed-off-by: Kees Cook <keescook@chromium.org> --- lib/reed_solomon/decode_rs.c | 7 ++++--- lib/reed_solomon/reed_solomon.c | 5 ++++- 2 files changed, 8 insertions(+), 4 deletions(-)