diff mbox series

[v2] gcc-plugins: latent_entropy: use /dev/urandom

Message ID 20220403204036.1269562-1-Jason@zx2c4.com (mailing list archive)
State Superseded
Headers show
Series [v2] gcc-plugins: latent_entropy: use /dev/urandom | expand

Commit Message

Jason A. Donenfeld April 3, 2022, 8:40 p.m. UTC
While the latent entropy plugin mostly doesn't derive entropy from
get_random_const() for measuring the call graph, when __latent_entropy is
applied to a constant, then it's initialized statically to output from
get_random_const(). In that case, this data is derived from a 64-bit
seed, which means a buffer of 512 bits doesn't really have that amount
of compile-time entropy.

This patch fixes that shortcoming by just buffering chunks of
/dev/urandom output and doling it out as requested.

At the same time, it's important that we don't break the use of
-frandom-seed, for people who want the runtime benefits of the latent
entropy plugin, while still having compile-time determinism. In that
case, we detect whether gcc's set_random_seed() has been called by
making a call to get_random_seed(noinit=true) in the plugin init
function, which is called after set_random_seed() is called but before
anything that calls get_random_seed(noinit=false), and seeing if it's
zero or not. If it's not zero, we're in deterministic mode, and so we
just generate numbers with a basic xorshift prng.

Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin")
Cc: stable@vger.kernel.org
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v1->v2:
- Pipacs pointed out that using /dev/urandom unconditionally would break
  the use of -frandom-seed, so now we check for that and keep with
  something deterministic in that case.

I'm not super familiar with this plugin or its conventions, so pointers
would be most welcome if something here looks amiss. The decision to
buffer 2k at a time is pretty arbitrary too; I haven't measured usage.

 scripts/gcc-plugins/latent_entropy_plugin.c | 48 +++++++++++++--------
 1 file changed, 30 insertions(+), 18 deletions(-)

Comments

Kees Cook April 4, 2022, 6:49 p.m. UTC | #1
On Sun, Apr 03, 2022 at 10:40:36PM +0200, Jason A. Donenfeld wrote:
> While the latent entropy plugin mostly doesn't derive entropy from
> get_random_const() for measuring the call graph, when __latent_entropy is
> applied to a constant, then it's initialized statically to output from
> get_random_const(). In that case, this data is derived from a 64-bit
> seed, which means a buffer of 512 bits doesn't really have that amount
> of compile-time entropy.
> 
> This patch fixes that shortcoming by just buffering chunks of
> /dev/urandom output and doling it out as requested.
> 
> At the same time, it's important that we don't break the use of
> -frandom-seed, for people who want the runtime benefits of the latent
> entropy plugin, while still having compile-time determinism. In that
> case, we detect whether gcc's set_random_seed() has been called by
> making a call to get_random_seed(noinit=true) in the plugin init
> function, which is called after set_random_seed() is called but before
> anything that calls get_random_seed(noinit=false), and seeing if it's
> zero or not. If it's not zero, we're in deterministic mode, and so we
> just generate numbers with a basic xorshift prng.

This mixes two changes: the pRNG change and the "use urandom if
non-deterministic" change. I think these should be split, so the pRNG
change can be explicitly justified.

More notes below...

> 
> Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin")
> Cc: stable@vger.kernel.org
> Cc: PaX Team <pageexec@freemail.hu>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> Changes v1->v2:
> - Pipacs pointed out that using /dev/urandom unconditionally would break
>   the use of -frandom-seed, so now we check for that and keep with
>   something deterministic in that case.
> 
> I'm not super familiar with this plugin or its conventions, so pointers
> would be most welcome if something here looks amiss. The decision to
> buffer 2k at a time is pretty arbitrary too; I haven't measured usage.
> 
>  scripts/gcc-plugins/latent_entropy_plugin.c | 48 +++++++++++++--------
>  1 file changed, 30 insertions(+), 18 deletions(-)
> 
> diff --git a/scripts/gcc-plugins/latent_entropy_plugin.c b/scripts/gcc-plugins/latent_entropy_plugin.c
> index 589454bce930..042442013ae1 100644
> --- a/scripts/gcc-plugins/latent_entropy_plugin.c
> +++ b/scripts/gcc-plugins/latent_entropy_plugin.c
> @@ -82,29 +82,37 @@ __visible int plugin_is_GPL_compatible;
>  static GTY(()) tree latent_entropy_decl;
>  
>  static struct plugin_info latent_entropy_plugin_info = {
> -	.version	= "201606141920vanilla",
> +	.version	= "202203311920vanilla",

This doesn't really need to be versioned. We can change this to just
"vanilla", IMO.

>  	.help		= "disable\tturn off latent entropy instrumentation\n",
>  };
>  
> -static unsigned HOST_WIDE_INT seed;
> -/*
> - * get_random_seed() (this is a GCC function) generates the seed.
> - * This is a simple random generator without any cryptographic security because
> - * the entropy doesn't come from here.
> - */
> +static unsigned HOST_WIDE_INT deterministic_seed;
> +static unsigned HOST_WIDE_INT rnd_buf[256];
> +static size_t rnd_idx = ARRAY_SIZE(rnd_buf);
> +static int urandom_fd = -1;
> +
>  static unsigned HOST_WIDE_INT get_random_const(void)
>  {
> -	unsigned int i;
> -	unsigned HOST_WIDE_INT ret = 0;
> -
> -	for (i = 0; i < 8 * sizeof(ret); i++) {
> -		ret = (ret << 1) | (seed & 1);
> -		seed >>= 1;
> -		if (ret & 1)
> -			seed ^= 0xD800000000000000ULL;
> +	if (deterministic_seed) {
> +		unsigned HOST_WIDE_INT w = deterministic_seed;
> +		w ^= w << 13;
> +		w ^= w >> 7;
> +		w ^= w << 17;
> +		deterministic_seed = w;
> +		return deterministic_seed;

While seemingly impossible, perhaps don't reset "deterministic_seed",
and just continue to use "seed", so that it can never become "0" again.

>  	}
>  
> -	return ret;
> +	if (urandom_fd < 0) {
> +		urandom_fd = open("/dev/urandom", O_RDONLY);
> +		if (urandom_fd < 0)
> +			abort();
> +	}
> +	if (rnd_idx >= ARRAY_SIZE(rnd_buf)) {
> +		if (read(urandom_fd, rnd_buf, sizeof(rnd_buf)) != sizeof(rnd_buf))
> +			abort();
> +		rnd_idx = 0;
> +	}
> +	return rnd_buf[rnd_idx++];
>  }
>  
>  static tree tree_get_random_const(tree type)
> @@ -537,8 +545,6 @@ static void latent_entropy_start_unit(void *gcc_data __unused,
>  	tree type, id;
>  	int quals;
>  
> -	seed = get_random_seed(false);
> -
>  	if (in_lto_p)
>  		return;
>  
> @@ -573,6 +579,12 @@ __visible int plugin_init(struct plugin_name_args *plugin_info,
>  	const struct plugin_argument * const argv = plugin_info->argv;
>  	int i;
>  
> +	/*
> +	 * Call get_random_seed() with noinit=true, so that this returns
> +	 * 0 in the case where no seed has been passed via -frandom-seed.
> +	 */
> +	deterministic_seed = get_random_seed(true);

i.e. have this be:

	deterministic_seed = get_random_seed(true);
	if (deterministic_seed)
		seed = get_random_seed(false);

> +
>  	static const struct ggc_root_tab gt_ggc_r_gt_latent_entropy[] = {
>  		{
>  			.base = &latent_entropy_decl,
> -- 
> 2.35.1
>
Jason A. Donenfeld April 4, 2022, 10:47 p.m. UTC | #2
Hi Kees,

On Mon, Apr 4, 2022 at 8:49 PM Kees Cook <keescook@chromium.org> wrote:
> This mixes two changes: the pRNG change and the "use urandom if
> non-deterministic" change. I think these should be split, so the pRNG
> change can be explicitly justified.

Alright, I'll split those. Or, more probably, just drop the xorshift
thing. There's not actually a strong reason for preferring xorshift. I
did it because it produces more uniformity and is faster to compute and
all that. But none of that stuff actually matters here. It was just a
sort of "well I'm at it..." thing.

> >  static struct plugin_info latent_entropy_plugin_info = {
> > -     .version        = "201606141920vanilla",
> > +     .version        = "202203311920vanilla",
>
> This doesn't really need to be versioned. We can change this to just
> "vanilla", IMO.

Okay. I suppose you want it to be in a different patch too, right? In
which case I'll leave it out and maybe get to it later. (I suppose one
probably needs to double check whether it's used for anything
interesting like dwarf debug info or whatever, where maybe it's
helpful?)

> > +     if (deterministic_seed) {
> > +             unsigned HOST_WIDE_INT w = deterministic_seed;
> > +             w ^= w << 13;
> > +             w ^= w >> 7;
> > +             w ^= w << 17;
> > +             deterministic_seed = w;
> > +             return deterministic_seed;
>
> While seemingly impossible, perhaps don't reset "deterministic_seed",
> and just continue to use "seed", so that it can never become "0" again.

Not sure I follow. It's an LFSR. The "L" is important. It'll never become
zero. It's not "seemingly". We can prove it trivially in Magma:

    > w := 64;
    > K := GF(2);
    > I := IdentityMatrix(K, w);
    > SHL := HorizontalJoin(RemoveColumn(I, 1), ZeroMatrix(K, w, 1));
    > SHR := HorizontalJoin(ZeroMatrix(K, w, 1), RemoveColumn(I, w));
    > M :=  (I + SHL^17) * (I + SHR^7) * (I + SHL^13);
    > Order(M) eq 2^64 - 1;
    true
    > P<x> := MinimalPolynomial(M);
    > IsPrimitive(P);
    true
    > IsInvertible(M);
    true 
    > Rank(M);
    64

And more obviously, splitting this into "seed" and "deterministic_seed",
as you suggested, wouldn't actually do much in the case when seed=0,
since 0<<N==0 and 0^0==0.

Jason
Kees Cook April 5, 2022, 3:01 a.m. UTC | #3
On Tue, Apr 05, 2022 at 12:47:14AM +0200, Jason A. Donenfeld wrote:
> On Mon, Apr 4, 2022 at 8:49 PM Kees Cook <keescook@chromium.org> wrote:
> > This mixes two changes: the pRNG change and the "use urandom if
> > non-deterministic" change. I think these should be split, so the pRNG
> > change can be explicitly justified.
> 
> Alright, I'll split those. Or, more probably, just drop the xorshift
> thing. There's not actually a strong reason for preferring xorshift. I
> did it because it produces more uniformity and is faster to compute and
> all that. But none of that stuff actually matters here. It was just a
> sort of "well I'm at it..." thing.

Well, it's nice to have and you already wrote it, so seems a waste to
just drop it. :)

> > >  static struct plugin_info latent_entropy_plugin_info = {
> > > -     .version        = "201606141920vanilla",
> > > +     .version        = "202203311920vanilla",
> >
> > This doesn't really need to be versioned. We can change this to just
> > "vanilla", IMO.
> 
> Okay. I suppose you want it to be in a different patch too, right? In
> which case I'll leave it out and maybe get to it later. (I suppose one
> probably needs to double check whether it's used for anything
> interesting like dwarf debug info or whatever, where maybe it's
> helpful?)

Hm, I don't think it shows up anywhere, but you can just drop the hunk
that touch it. I can remove them all with a separate patch later.

> > > +     if (deterministic_seed) {
> > > +             unsigned HOST_WIDE_INT w = deterministic_seed;
> > > +             w ^= w << 13;
> > > +             w ^= w >> 7;
> > > +             w ^= w << 17;
> > > +             deterministic_seed = w;
> > > +             return deterministic_seed;
> >
> > While seemingly impossible, perhaps don't reset "deterministic_seed",
> > and just continue to use "seed", so that it can never become "0" again.
> 
> Not sure I follow. It's an LFSR. The "L" is important. It'll never become
> zero. It's not "seemingly". We can prove it trivially in Magma:

Got it; yeah. I was reading too quickly. My brain misparsed and got
stuck on "left shift", but it's using rotation. Sorry for the noise.
Jason A. Donenfeld April 5, 2022, 12:38 p.m. UTC | #4
Hi Kees,

On 4/5/22, Kees Cook <keescook@chromium.org> wrote:
> On Tue, Apr 05, 2022 at 12:47:14AM +0200, Jason A. Donenfeld wrote:
>> On Mon, Apr 4, 2022 at 8:49 PM Kees Cook <keescook@chromium.org> wrote:
>> > This mixes two changes: the pRNG change and the "use urandom if
>> > non-deterministic" change. I think these should be split, so the pRNG
>> > change can be explicitly justified.
>>
>> Alright, I'll split those. Or, more probably, just drop the xorshift
>> thing. There's not actually a strong reason for preferring xorshift. I
>> did it because it produces more uniformity and is faster to compute and
>> all that. But none of that stuff actually matters here. It was just a
>> sort of "well I'm at it..." thing.
>
> Well, it's nice to have and you already wrote it, so seems a waste to
> just drop it. :)
>
>> > >  static struct plugin_info latent_entropy_plugin_info = {
>> > > -     .version        = "201606141920vanilla",
>> > > +     .version        = "202203311920vanilla",
>> >
>> > This doesn't really need to be versioned. We can change this to just
>> > "vanilla", IMO.
>>
>> Okay. I suppose you want it to be in a different patch too, right? In
>> which case I'll leave it out and maybe get to it later. (I suppose one
>> probably needs to double check whether it's used for anything
>> interesting like dwarf debug info or whatever, where maybe it's
>> helpful?)
>
> Hm, I don't think it shows up anywhere, but you can just drop the hunk
> that touch it. I can remove them all with a separate patch later.
>

Okay. That's what I did here
https://lore.kernel.org/lkml/20220404230709.124508-1-Jason@zx2c4.com/
so awaiting your merge. (I still find all aspects of v2 more
preferable for a variety of weak reasons in case you'd like to merge
that instead, but v3 is available now.)

Jason
Kees Cook April 5, 2022, 5:17 p.m. UTC | #5
On Tue, Apr 05, 2022 at 02:38:58PM +0200, Jason A. Donenfeld wrote:
> Hi Kees,
> 
> On 4/5/22, Kees Cook <keescook@chromium.org> wrote:
> > On Tue, Apr 05, 2022 at 12:47:14AM +0200, Jason A. Donenfeld wrote:
> >> On Mon, Apr 4, 2022 at 8:49 PM Kees Cook <keescook@chromium.org> wrote:
> >> > This mixes two changes: the pRNG change and the "use urandom if
> >> > non-deterministic" change. I think these should be split, so the pRNG
> >> > change can be explicitly justified.
> >>
> >> Alright, I'll split those. Or, more probably, just drop the xorshift
> >> thing. There's not actually a strong reason for preferring xorshift. I
> >> did it because it produces more uniformity and is faster to compute and
> >> all that. But none of that stuff actually matters here. It was just a
> >> sort of "well I'm at it..." thing.
> >
> > Well, it's nice to have and you already wrote it, so seems a waste to
> > just drop it. :)
> >
> >> > >  static struct plugin_info latent_entropy_plugin_info = {
> >> > > -     .version        = "201606141920vanilla",
> >> > > +     .version        = "202203311920vanilla",
> >> >
> >> > This doesn't really need to be versioned. We can change this to just
> >> > "vanilla", IMO.
> >>
> >> Okay. I suppose you want it to be in a different patch too, right? In
> >> which case I'll leave it out and maybe get to it later. (I suppose one
> >> probably needs to double check whether it's used for anything
> >> interesting like dwarf debug info or whatever, where maybe it's
> >> helpful?)
> >
> > Hm, I don't think it shows up anywhere, but you can just drop the hunk
> > that touch it. I can remove them all with a separate patch later.
> >
> 
> Okay. That's what I did here
> https://lore.kernel.org/lkml/20220404230709.124508-1-Jason@zx2c4.com/
> so awaiting your merge. (I still find all aspects of v2 more
> preferable for a variety of weak reasons in case you'd like to merge
> that instead, but v3 is available now.)

v3 uses a different check for the -f option, though? Isn't that
preferred over the v2 method?

Also, I did some quick benchmarking, and any difference in runtime is
completely lost in the noise, so that's good.
Jason A. Donenfeld April 5, 2022, 5:40 p.m. UTC | #6
Hi Kees,

On 4/5/22, Kees Cook <keescook@chromium.org> wrote:
> v3 uses a different check for the -f option, though? Isn't that
> preferred over the v2 method?

Based on the code comments, I assume this is gcc upstream's intended
method. It strikes me as worse, though, because that variable, when
it's not set to -1, is set to: `local_tick = (unsigned) tv.tv_sec *
1000 + tv.tv_usec / 1000;` which is on occasion unlucky and hits -1
too. But maybe that's a bug in gcc that should be fixed instead? I
don't know really. But anyway that's why I'm /also/ more into that
aspect of v2.

> Also, I did some quick benchmarking, and any difference in runtime is
> completely lost in the noise, so that's good.

Oh good to hear. So my 2k buffer is fine then.

Jason
diff mbox series

Patch

diff --git a/scripts/gcc-plugins/latent_entropy_plugin.c b/scripts/gcc-plugins/latent_entropy_plugin.c
index 589454bce930..042442013ae1 100644
--- a/scripts/gcc-plugins/latent_entropy_plugin.c
+++ b/scripts/gcc-plugins/latent_entropy_plugin.c
@@ -82,29 +82,37 @@  __visible int plugin_is_GPL_compatible;
 static GTY(()) tree latent_entropy_decl;
 
 static struct plugin_info latent_entropy_plugin_info = {
-	.version	= "201606141920vanilla",
+	.version	= "202203311920vanilla",
 	.help		= "disable\tturn off latent entropy instrumentation\n",
 };
 
-static unsigned HOST_WIDE_INT seed;
-/*
- * get_random_seed() (this is a GCC function) generates the seed.
- * This is a simple random generator without any cryptographic security because
- * the entropy doesn't come from here.
- */
+static unsigned HOST_WIDE_INT deterministic_seed;
+static unsigned HOST_WIDE_INT rnd_buf[256];
+static size_t rnd_idx = ARRAY_SIZE(rnd_buf);
+static int urandom_fd = -1;
+
 static unsigned HOST_WIDE_INT get_random_const(void)
 {
-	unsigned int i;
-	unsigned HOST_WIDE_INT ret = 0;
-
-	for (i = 0; i < 8 * sizeof(ret); i++) {
-		ret = (ret << 1) | (seed & 1);
-		seed >>= 1;
-		if (ret & 1)
-			seed ^= 0xD800000000000000ULL;
+	if (deterministic_seed) {
+		unsigned HOST_WIDE_INT w = deterministic_seed;
+		w ^= w << 13;
+		w ^= w >> 7;
+		w ^= w << 17;
+		deterministic_seed = w;
+		return deterministic_seed;
 	}
 
-	return ret;
+	if (urandom_fd < 0) {
+		urandom_fd = open("/dev/urandom", O_RDONLY);
+		if (urandom_fd < 0)
+			abort();
+	}
+	if (rnd_idx >= ARRAY_SIZE(rnd_buf)) {
+		if (read(urandom_fd, rnd_buf, sizeof(rnd_buf)) != sizeof(rnd_buf))
+			abort();
+		rnd_idx = 0;
+	}
+	return rnd_buf[rnd_idx++];
 }
 
 static tree tree_get_random_const(tree type)
@@ -537,8 +545,6 @@  static void latent_entropy_start_unit(void *gcc_data __unused,
 	tree type, id;
 	int quals;
 
-	seed = get_random_seed(false);
-
 	if (in_lto_p)
 		return;
 
@@ -573,6 +579,12 @@  __visible int plugin_init(struct plugin_name_args *plugin_info,
 	const struct plugin_argument * const argv = plugin_info->argv;
 	int i;
 
+	/*
+	 * Call get_random_seed() with noinit=true, so that this returns
+	 * 0 in the case where no seed has been passed via -frandom-seed.
+	 */
+	deterministic_seed = get_random_seed(true);
+
 	static const struct ggc_root_tab gt_ggc_r_gt_latent_entropy[] = {
 		{
 			.base = &latent_entropy_decl,