[RFC] rslib: Remove VLAs by setting upper bound on nroots

Message ID	20180309144613.GA48965@beast (mailing list archive)
State	New, archived
Headers	show Return-Path: <kernel-hardening-return-12326-patchwork-kernel-hardening=patchwork.kernel.org@lists.openwall.com> Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk Date: Fri, 9 Mar 2018 06:46:13 -0800 From: Kees Cook <keescook@chromium.org> To: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org, Segher Boessenkool <segher@kernel.crashing.org>, kernel-hardening@lists.openwall.com Subject: [PATCH][RFC] rslib: Remove VLAs by setting upper bound on nroots Message-ID: <20180309144613.GA48965@beast> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline

Message ID

20180309144613.GA48965@beast (mailing list archive)

State

New, archived

Headers

Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm
Precedence: bulk
Date: Fri, 9 Mar 2018 06:46:13 -0800
From: Kees Cook <keescook@chromium.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org,
	Segher Boessenkool <segher@kernel.crashing.org>,
	kernel-hardening@lists.openwall.com
Subject: [PATCH][RFC] rslib: Remove VLAs by setting upper bound on nroots
Message-ID: <20180309144613.GA48965@beast>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Commit Message

Kees Cook March 9, 2018, 2:46 p.m. UTC

Avoid VLAs[1] by always allocating the upper bound of stack space
needed. The existing users of rslib appear to max out at 32 roots,
so use that as the upper bound.

Alternative: make init_rs() a true caller-instance and pre-allocate
the workspaces. Will this need locking or are the callers already
single-threaded in their use of librs?

Using kmalloc in this path doesn't look great, especially since at
least one caller (pstore) is sensitive to allocations during rslib
usage (it expects to run it during an Oops, for example).

[1] https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 lib/reed_solomon/decode_rs.c    | 7 ++++---
 lib/reed_solomon/reed_solomon.c | 5 ++++-
 2 files changed, 8 insertions(+), 4 deletions(-)

Comments

Thomas Gleixner March 9, 2018, 3:49 p.m. UTC | #1

On Fri, 9 Mar 2018, Kees Cook wrote:

> Avoid VLAs[1] by always allocating the upper bound of stack space
> needed. The existing users of rslib appear to max out at 32 roots,
> so use that as the upper bound.

I think 32 is plenty. Do we have actually a user with 32?

> Alternative: make init_rs() a true caller-instance and pre-allocate
> the workspaces. Will this need locking or are the callers already
> single-threaded in their use of librs?

init_rs() is an init function which needs to be invoked _before_ the
decoder/encoder can be used.

The way it works today that it can share the rs_control between users to
avoid duplicating the polynom arrays and the setup of them.

So we might change how rs_control works and allocate rs_control for each
invocation of init_rs(). That means we need two data structures:

Rename rs_control to rs_poly and just use that internaly for sharing the
polynom arrays.

rs_control then becomes:

struct rs_control {
	struct rs_poly	*poly;
	uint16_t	lamda[MAX_ROOTS + 1];
	....
	uint16_t	loc[MAX_ROOTS];
};

But as you said that requires serialization or separation at the usage
sites.

drivers/mtd/nand/* would either need a mutex or allocate one rs_control per
instance. Simple enough to do.

drivers/md/dm-verity-fec.c looks like it's allocating a dm control struct
for each worker thread, so that should just require allocating one
rs_control per worker then.

pstore only has an issue in case of OOPS. A simple solution would be to
allocate two rs_control structs, one for regular usage and one for the OOPS
case. Not sure if that covers all possible problems, so that needs more
thoughts.

Thanks,

	tglx

Kees Cook March 9, 2018, 8:57 p.m. UTC | #2

On Fri, Mar 9, 2018 at 7:49 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 9 Mar 2018, Kees Cook wrote:
>
>> Avoid VLAs[1] by always allocating the upper bound of stack space
>> needed. The existing users of rslib appear to max out at 32 roots,
>> so use that as the upper bound.
>
> I think 32 is plenty. Do we have actually a user with 32?

I found 24 as the max, but thought maybe 32 would be better?

drivers/md/dm-verity-fec.h:#define DM_VERITY_FEC_RSM            255
drivers/md/dm-verity-fec.h:#define DM_VERITY_FEC_MAX_RSN                253
drivers/md/dm-verity-fec.h:#define DM_VERITY_FEC_MIN_RSN
 231     /* ~10% space overhead */
drivers/md/dm-verity-fec.c:

                if (sscanf(arg_value, "%hhu%c", &num_c, &dummy) != 1
|| !num_c ||
                    num_c < (DM_VERITY_FEC_RSM - DM_VERITY_FEC_MAX_RSN) ||
                    num_c > (DM_VERITY_FEC_RSM - DM_VERITY_FEC_MIN_RSN)) {
                        ti->error = "Invalid " DM_VERITY_OPT_FEC_ROOTS;
                        return -EINVAL;
                }
                v->fec->roots = num_c;
...
drivers/md/dm-verity-fec.c:     return init_rs(8, 0x11d, 0, 1, v->fec->roots);

So this can be as much as 24.

drivers/mtd/nand/diskonchip.c:#define NROOTS 4
drivers/mtd/nand/diskonchip.c:  rs_decoder = init_rs(10, 0x409, FCR, 1, NROOTS);

4.

fs/pstore/ram.c:static int ramoops_ecc;
fs/pstore/ram.c:module_param_named(ecc, ramoops_ecc, int, 0600);
fs/pstore/ram.c:MODULE_PARM_DESC(ramoops_ecc,
fs/pstore/ram.c:        dummy_data->ecc_info.ecc_size = ramoops_ecc ==
1 ? 16 : ramoops_ecc;
...
fs/pstore/ram.c:        cxt->ecc_info = pdata->ecc_info;
...
fs/pstore/ram_core.c:   prz->rs_decoder =
init_rs(prz->ecc_info.symsize, prz->ecc_info.poly,
fs/pstore/ram_core.c-                             0, 1, prz->ecc_info.ecc_size);

The default "ecc enabled" mode for pstore is 16, but was made dynamic
a while ago. However, I've only ever seen people use a smaller number
of roots.

>> Alternative: make init_rs() a true caller-instance and pre-allocate
>> the workspaces. Will this need locking or are the callers already
>> single-threaded in their use of librs?
>
> init_rs() is an init function which needs to be invoked _before_ the
> decoder/encoder can be used.
>
> The way it works today that it can share the rs_control between users to
> avoid duplicating the polynom arrays and the setup of them.
>
> So we might change how rs_control works and allocate rs_control for each
> invocation of init_rs(). That means we need two data structures:
>
> Rename rs_control to rs_poly and just use that internaly for sharing the
> polynom arrays.
>
> rs_control then becomes:
>
> struct rs_control {
>         struct rs_poly  *poly;
>         uint16_t        lamda[MAX_ROOTS + 1];
>         ....
>         uint16_t        loc[MAX_ROOTS];
> };
>
> But as you said that requires serialization or separation at the usage
> sites.

Right. Not my favorite idea. :P

> drivers/mtd/nand/* would either need a mutex or allocate one rs_control per
> instance. Simple enough to do.
>
> drivers/md/dm-verity-fec.c looks like it's allocating a dm control struct
> for each worker thread, so that should just require allocating one
> rs_control per worker then.
>
> pstore only has an issue in case of OOPS. A simple solution would be to
> allocate two rs_control structs, one for regular usage and one for the OOPS
> case. Not sure if that covers all possible problems, so that needs more
> thoughts.

Maybe I should just go with 24 as the max, and if we have a case where
we need more, address it then?

-Kees

Thomas Gleixner March 9, 2018, 10:57 p.m. UTC | #3

On Fri, 9 Mar 2018, Kees Cook wrote:
> On Fri, Mar 9, 2018 at 7:49 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Fri, 9 Mar 2018, Kees Cook wrote:
> 
> Maybe I should just go with 24 as the max, and if we have a case where
> we need more, address it then?

Works for me.

Thanks,

	tglx

diff --git a/lib/reed_solomon/decode_rs.c b/lib/reed_solomon/decode_rs.c
index 0ec3f257ffdf..3e3becb836a6 100644
--- a/lib/reed_solomon/decode_rs.c
+++ b/lib/reed_solomon/decode_rs.c
@@ -31,9 +31,10 @@ 
 	 * of nroots is 8. So the necessary stack size will be about
 	 * 220 bytes max.
 	 */
-	uint16_t lambda[nroots + 1], syn[nroots];
-	uint16_t b[nroots + 1], t[nroots + 1], omega[nroots + 1];
-	uint16_t root[nroots], reg[nroots + 1], loc[nroots];
+	uint16_t lambda[RS_MAX_ROOTS + 1], syn[RS_MAX_ROOTS];
+	uint16_t b[RS_MAX_ROOTS + 1], t[RS_MAX_ROOTS + 1];
+	uint16_t omega[RS_MAX_ROOTS + 1], root[RS_MAX_ROOTS];
+	uint16_t reg[RS_MAX_ROOTS + 1], loc[RS_MAX_ROOTS];
 	int count = 0;
 	uint16_t msk = (uint16_t) rs->nn;
 
diff --git a/lib/reed_solomon/reed_solomon.c b/lib/reed_solomon/reed_solomon.c
index 06d04cfa9339..1ad9094ddf66 100644
--- a/lib/reed_solomon/reed_solomon.c
+++ b/lib/reed_solomon/reed_solomon.c
@@ -51,6 +51,9 @@  static LIST_HEAD (rslist);
 /* Protection for the list */
 static DEFINE_MUTEX(rslistlock);
 
+/* Ultimately controls the upper bounds of the on-stack buffers. */
+#define RS_MAX_ROOTS	32
+
 /**
  * rs_init - Initialize a Reed-Solomon codec
  * @symsize:	symbol size, bits (1-8)
@@ -210,7 +213,7 @@  static struct rs_control *init_rs_internal(int symsize, int gfpoly,
     		return NULL;
 	if (prim <= 0 || prim >= (1<<symsize))
     		return NULL;
-	if (nroots < 0 || nroots >= (1<<symsize))
+	if (nroots < 0 || nroots >= (1<<symsize) || nroots > RS_MAX_ROOTS)
 		return NULL;
 
 	mutex_lock(&rslistlock);

[RFC] rslib: Remove VLAs by setting upper bound on nroots

Commit Message

Comments

Patch