diff mbox

Re: [PATCH v4 06/13] iscsi: ensure RNG is seeded before use

Message ID CAHmME9pViOqqDPyjXKLfCWSTnQrcE4OLJMdK1yaTiUvrOV+ecQ@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jason A. Donenfeld June 8, 2017, 12:09 p.m. UTC
On Thu, Jun 8, 2017 at 4:43 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> What was the testing that was done for commit?  It looks safe, but I'm
> unfamiliar enough with how the iSCSI authentication works that I'd
> prefer getting an ack'ed by from the iSCSI maintainers or
> alternativel, information about how to kick off some kind of automated
> test suite ala xfstests for file systems.

Only very basic testing from my end.

I'm thus adding the iSCSI list to see if they'll have a look (patch reattached).

Jason

Comments

Lee Duncan June 16, 2017, 9:58 p.m. UTC | #1
On 06/08/2017 05:09 AM, Jason A. Donenfeld wrote:
> On Thu, Jun 8, 2017 at 4:43 AM, Theodore Ts'o <tytso@mit.edu> wrote:
>> What was the testing that was done for commit?  It looks safe, but I'm
>> unfamiliar enough with how the iSCSI authentication works that I'd
>> prefer getting an ack'ed by from the iSCSI maintainers or
>> alternativel, information about how to kick off some kind of automated
>> test suite ala xfstests for file systems.
> 
> Only very basic testing from my end.
> 
> I'm thus adding the iSCSI list to see if they'll have a look (patch reattached).
> 
> Jason
> 

It seems like what you are doing is basically "good", i.e. if there is
not enough random data, don't use it. But what happens in that case? The
authentication fails? How does the user know to wait and try again?
Jason A. Donenfeld June 17, 2017, 12:41 a.m. UTC | #2
Hi Lee,

On Fri, Jun 16, 2017 at 11:58 PM, Lee Duncan <lduncan@suse.com> wrote:
> It seems like what you are doing is basically "good", i.e. if there is
> not enough random data, don't use it. But what happens in that case? The
> authentication fails? How does the user know to wait and try again?

The process just remains in interruptible (kill-able) sleep until
there is enough entropy, so the process doesn't need to do anything.
If the waiting is interrupted by a signal, it returns -ESYSRESTART,
which follows the usual semantics of restartable syscalls.

Jason
Lee Duncan June 17, 2017, 3:45 a.m. UTC | #3
On 06/16/2017 05:41 PM, Jason A. Donenfeld wrote:
> Hi Lee,
> 
> On Fri, Jun 16, 2017 at 11:58 PM, Lee Duncan <lduncan@suse.com> wrote:
>> It seems like what you are doing is basically "good", i.e. if there is
>> not enough random data, don't use it. But what happens in that case? The
>> authentication fails? How does the user know to wait and try again?
> 
> The process just remains in interruptible (kill-able) sleep until
> there is enough entropy, so the process doesn't need to do anything.
> If the waiting is interrupted by a signal, it returns -ESYSRESTART,
> which follows the usual semantics of restartable syscalls.
> 
> Jason
> 

In your testing, how long might a process have to wait? Are we talking
seconds? Longer? What about timeouts?

Sorry, but your changing something that isn't exactly broken, so I just
want to be sure we're not introducing some regression, like clients
can't connect the first 5 minutes are a reboot.
Jeffrey Walton June 17, 2017, 2:23 p.m. UTC | #4
On Fri, Jun 16, 2017 at 11:45 PM, Lee Duncan <lduncan@suse.com> wrote:
> On 06/16/2017 05:41 PM, Jason A. Donenfeld wrote:
>> Hi Lee,
>>
>> On Fri, Jun 16, 2017 at 11:58 PM, Lee Duncan <lduncan@suse.com> wrote:
>>> It seems like what you are doing is basically "good", i.e. if there is
>>> not enough random data, don't use it. But what happens in that case? The
>>> authentication fails? How does the user know to wait and try again?
>>
>> The process just remains in interruptible (kill-able) sleep until
>> there is enough entropy, so the process doesn't need to do anything.
>> If the waiting is interrupted by a signal, it returns -ESYSRESTART,
>> which follows the usual semantics of restartable syscalls.
>>
> In your testing, how long might a process have to wait? Are we talking
> seconds? Longer? What about timeouts?
>
> Sorry, but your changing something that isn't exactly broken, so I just
> want to be sure we're not introducing some regression, like clients
> can't connect the first 5 minutes are a reboot.

CHAP (https://www.rfc-editor.org/rfc/rfc1994.txt) and iSCSI
(https://www.ietf.org/rfc/rfc3720.txt) require random values. If iSCSI
is operating without them, it seems like something is broken. From RFC
3720, Section 8.2.1, CHAP Considerations:

   When CHAP is performed over a non-encrypted channel, it is vulnerable
   to an off-line dictionary attack.  Implementations MUST support use
   of up to 128 bit random CHAP secrets, including the means to generate
   such secrets and to accept them from an external generation source.
   Implementations MUST NOT provide secret generation (or expansion)
   means other than random generation.

CHAP actually has a weaker requirement since it only requires _unique_
(and not _random_). From RFC 1994, Section 2.3, Design Requirements:

   Each challenge value SHOULD be unique, since repetition of a
   challenge value in conjunction with the same secret would permit an
   attacker to reply with a previously intercepted response.  Since it
   is expected that the same secret MAY be used to authenticate with
   servers in disparate geographic regions, the challenge SHOULD exhibit
   global and temporal uniqueness.

But its not clear to me how to ensure uniqueness when its based on
randomness from the generators.

Jeff
Paul Koning June 17, 2017, 6:50 p.m. UTC | #5
> On Jun 17, 2017, at 10:23 AM, Jeffrey Walton <noloader@gmail.com> wrote:
> 
> On Fri, Jun 16, 2017 at 11:45 PM, Lee Duncan <lduncan@suse.com> wrote:
>> On 06/16/2017 05:41 PM, Jason A. Donenfeld wrote:
>>> Hi Lee,
>>> 
>>> On Fri, Jun 16, 2017 at 11:58 PM, Lee Duncan <lduncan@suse.com> wrote:
>>>> It seems like what you are doing is basically "good", i.e. if there is
>>>> not enough random data, don't use it. But what happens in that case? The
>>>> authentication fails? How does the user know to wait and try again?
>>> 
>>> The process just remains in interruptible (kill-able) sleep until
>>> there is enough entropy, so the process doesn't need to do anything.
>>> If the waiting is interrupted by a signal, it returns -ESYSRESTART,
>>> which follows the usual semantics of restartable syscalls.
>>> 
>> In your testing, how long might a process have to wait? Are we talking
>> seconds? Longer? What about timeouts?
>> 
>> Sorry, but your changing something that isn't exactly broken, so I just
>> want to be sure we're not introducing some regression, like clients
>> can't connect the first 5 minutes are a reboot.
> 
> CHAP (https://www.rfc-editor.org/rfc/rfc1994.txt) and iSCSI
> (https://www.ietf.org/rfc/rfc3720.txt) require random values. If iSCSI
> is operating without them, it seems like something is broken. From RFC
> 3720, Section 8.2.1, CHAP Considerations:
> 
>   When CHAP is performed over a non-encrypted channel, it is vulnerable
>   to an off-line dictionary attack.  Implementations MUST support use
>   of up to 128 bit random CHAP secrets, including the means to generate
>   such secrets and to accept them from an external generation source.
>   Implementations MUST NOT provide secret generation (or expansion)
>   means other than random generation.

That only applies to the generation of the secret, which is configured into iscsi, not created by it.  A utility to generate the secret might be supplied, of course, just as one might have utilities to generate strong passwords, but it's not a component of the iSCSI protocol.

> CHAP actually has a weaker requirement since it only requires _unique_
> (and not _random_). From RFC 1994, Section 2.3, Design Requirements:
> 
>   Each challenge value SHOULD be unique, since repetition of a
>   challenge value in conjunction with the same secret would permit an
>   attacker to reply with a previously intercepted response.  Since it
>   is expected that the same secret MAY be used to authenticate with
>   servers in disparate geographic regions, the challenge SHOULD exhibit
>   global and temporal uniqueness.
> 
> But its not clear to me how to ensure uniqueness when its based on
> randomness from the generators.

A strong RNG of length n will produce numbers likely to be unique until you approach the birtday limit 2^(n/2).  So, say, a 128 bit challenge will be adequate.

	paul
Stephan Mueller June 18, 2017, 8:04 a.m. UTC | #6
Am Samstag, 17. Juni 2017, 05:45:57 CEST schrieb Lee Duncan:

Hi Lee,

> In your testing, how long might a process have to wait? Are we talking
> seconds? Longer? What about timeouts?
>

In current kernels (starting with 4.8) this timeout should clear within a few 
seconds after boot.

In older kernels (pre 4.8), my KVM takes up to 90 seconds to reach that 
seeding point. I have heard that on IBM System Z this trigger point requires 
minutes to be reached.

Ciao
Stephan
Nicholas A. Bellinger June 26, 2017, 1:23 a.m. UTC | #7
Hi Stephan, Lee & Jason,

(Adding target-devel CC')

Apologies for coming late to the discussion.  Comments below.

On Sun, 2017-06-18 at 10:04 +0200, Stephan Müller wrote:
> Am Samstag, 17. Juni 2017, 05:45:57 CEST schrieb Lee Duncan:
> 
> Hi Lee,
> 
> > In your testing, how long might a process have to wait? Are we talking
> > seconds? Longer? What about timeouts?
> >
> 
> In current kernels (starting with 4.8) this timeout should clear within a few 
> seconds after boot.
> 
> In older kernels (pre 4.8), my KVM takes up to 90 seconds to reach that 
> seeding point. I have heard that on IBM System Z this trigger point requires 
> minutes to be reached.
> 

I share the same concern as Lee wrt to introducing latency into the
existing iscsi-target login sequence.

Namely in the use-cases where a single node is supporting ~1K unique
iscsi-target IQNs, and fail-over + re-balancing of IQNs where 100s of
login attempts are expected to occur in parallel.

If environments exist that require non trivial amounts of time for RNG
seeding to be ready for iscsi-target usage, then enforcing this
requirement at iscsi login time can open up problems, especially when
iscsi host environments may be sensitive to login timeouts, I/O
timeouts, et al.

That said, I'd prefer to simply wait for RNG to be seeded at modprobe
iscsi_target_mod time, instead of trying to enforce randomness during
login.

This way, those of use who have distributed storage platforms can know
to avoid putting a node into a state to accept iscsi-target IQN export
migration, before modprobe iscsi_target_mod has successfully loaded and
RNG seeding has been confirmed.

WDYT..?
Stephan Mueller June 26, 2017, 5:38 p.m. UTC | #8
Am Montag, 26. Juni 2017, 03:23:09 CEST schrieb Nicholas A. Bellinger:

Hi Nicholas,

> Hi Stephan, Lee & Jason,
> 
> (Adding target-devel CC')
> 
> Apologies for coming late to the discussion.  Comments below.
> 
> On Sun, 2017-06-18 at 10:04 +0200, Stephan Müller wrote:
> > Am Samstag, 17. Juni 2017, 05:45:57 CEST schrieb Lee Duncan:
> > 
> > Hi Lee,
> > 
> > > In your testing, how long might a process have to wait? Are we talking
> > > seconds? Longer? What about timeouts?
> > 
> > In current kernels (starting with 4.8) this timeout should clear within a
> > few seconds after boot.
> > 
> > In older kernels (pre 4.8), my KVM takes up to 90 seconds to reach that
> > seeding point. I have heard that on IBM System Z this trigger point
> > requires minutes to be reached.
> 
> I share the same concern as Lee wrt to introducing latency into the
> existing iscsi-target login sequence.
> 
> Namely in the use-cases where a single node is supporting ~1K unique
> iscsi-target IQNs, and fail-over + re-balancing of IQNs where 100s of
> login attempts are expected to occur in parallel.
> 
> If environments exist that require non trivial amounts of time for RNG
> seeding to be ready for iscsi-target usage, then enforcing this
> requirement at iscsi login time can open up problems, especially when
> iscsi host environments may be sensitive to login timeouts, I/O
> timeouts, et al.
> 
> That said, I'd prefer to simply wait for RNG to be seeded at modprobe
> iscsi_target_mod time, instead of trying to enforce randomness during
> login.
> 
> This way, those of use who have distributed storage platforms can know
> to avoid putting a node into a state to accept iscsi-target IQN export
> migration, before modprobe iscsi_target_mod has successfully loaded and
> RNG seeding has been confirmed.
> 
> WDYT..?

We may have a chicken and egg problem when the wait is at the modprobe time. 
Assume the modprobe happens during initramfs time to get access to the root 
file system. In this case, you entire boot process will lock up for an 
indefinite amount of time. The reason is that in order to obtain events 
detected by the RNG, devices need to be initialized and working. Such devices 
commonly start working after the the root partition is mounted as it contains 
all drivers, all configuration information etc.

Note, during the development of my /dev/random implementation, I added the 
getrandom-like blocking behavior to /dev/urandom (which is the equivalent to 
Jason's patch except that it applies to user space). The boot process locked 
up since systemd wanted data from /dev/urandom while it processed the 
initramfs. As it did not get any, the boot process did not commence that could 
deliver new events to be picked up by the RNG.

As I do not have such an iscsi system, I cannot test Jason's patch. But maybe 
the mentioned chicken-and-egg problem I mentioned above is already visible 
with the current patch as it will lead to a blocking of the mounting of the 
root partition in case the root partition is on an iscsi target.

Ciao
Stephan
Nicholas A. Bellinger June 30, 2017, 6:02 a.m. UTC | #9
On Mon, 2017-06-26 at 19:38 +0200, Stephan Müller wrote:
> Am Montag, 26. Juni 2017, 03:23:09 CEST schrieb Nicholas A. Bellinger:
> 
> Hi Nicholas,
> 
> > Hi Stephan, Lee & Jason,
> > 
> > (Adding target-devel CC')
> > 
> > Apologies for coming late to the discussion.  Comments below.
> > 
> > On Sun, 2017-06-18 at 10:04 +0200, Stephan Müller wrote:
> > > Am Samstag, 17. Juni 2017, 05:45:57 CEST schrieb Lee Duncan:
> > > 
> > > Hi Lee,
> > > 
> > > > In your testing, how long might a process have to wait? Are we talking
> > > > seconds? Longer? What about timeouts?
> > > 
> > > In current kernels (starting with 4.8) this timeout should clear within a
> > > few seconds after boot.
> > > 
> > > In older kernels (pre 4.8), my KVM takes up to 90 seconds to reach that
> > > seeding point. I have heard that on IBM System Z this trigger point
> > > requires minutes to be reached.
> > 
> > I share the same concern as Lee wrt to introducing latency into the
> > existing iscsi-target login sequence.
> > 
> > Namely in the use-cases where a single node is supporting ~1K unique
> > iscsi-target IQNs, and fail-over + re-balancing of IQNs where 100s of
> > login attempts are expected to occur in parallel.
> > 
> > If environments exist that require non trivial amounts of time for RNG
> > seeding to be ready for iscsi-target usage, then enforcing this
> > requirement at iscsi login time can open up problems, especially when
> > iscsi host environments may be sensitive to login timeouts, I/O
> > timeouts, et al.
> > 
> > That said, I'd prefer to simply wait for RNG to be seeded at modprobe
> > iscsi_target_mod time, instead of trying to enforce randomness during
> > login.
> > 
> > This way, those of use who have distributed storage platforms can know
> > to avoid putting a node into a state to accept iscsi-target IQN export
> > migration, before modprobe iscsi_target_mod has successfully loaded and
> > RNG seeding has been confirmed.
> > 
> > WDYT..?
> 
> We may have a chicken and egg problem when the wait is at the modprobe time. 
> Assume the modprobe happens during initramfs time to get access to the root 
> file system. In this case, you entire boot process will lock up for an 
> indefinite amount of time. The reason is that in order to obtain events 
> detected by the RNG, devices need to be initialized and working. Such devices 
> commonly start working after the the root partition is mounted as it contains 
> all drivers, all configuration information etc.
> 
> Note, during the development of my /dev/random implementation, I added the 
> getrandom-like blocking behavior to /dev/urandom (which is the equivalent to 
> Jason's patch except that it applies to user space). The boot process locked 
> up since systemd wanted data from /dev/urandom while it processed the 
> initramfs. As it did not get any, the boot process did not commence that could 
> deliver new events to be picked up by the RNG.
> 
> As I do not have such an iscsi system, I cannot test Jason's patch. But maybe 
> the mentioned chicken-and-egg problem I mentioned above is already visible 
> with the current patch as it will lead to a blocking of the mounting of the 
> root partition in case the root partition is on an iscsi target.

AFAIK, there are no distro initramfs dependencies for iscsi_target_mod,
and every environment I've ever seen loads iscsi_target_mod after
switching to the real rootfs.

For an iscsi initiator that might not been the case, especially if the
rootfs is running atop a iscsi LUN.

But at least for iscsi-target mode, any blocking during modprobe waiting
for RNG seeding would happen outside of initramfs.
Ulrich Windl July 5, 2017, 7:03 a.m. UTC | #10
>>> Stephan Müller <smueller@chronox.de> schrieb am 26.06.2017 um 19:38 in
Nachricht <1678474.GnYBdSlWgs@tauon.chronox.de>:
> Am Montag, 26. Juni 2017, 03:23:09 CEST schrieb Nicholas A. Bellinger:
> 
> Hi Nicholas,
> 
>> Hi Stephan, Lee & Jason,
>> 
>> (Adding target-devel CC')
>> 
>> Apologies for coming late to the discussion.  Comments below.
>> 
>> On Sun, 2017-06-18 at 10:04 +0200, Stephan Müller wrote:
>> > Am Samstag, 17. Juni 2017, 05:45:57 CEST schrieb Lee Duncan:
>> > 
>> > Hi Lee,
>> > 
>> > > In your testing, how long might a process have to wait? Are we talking
>> > > seconds? Longer? What about timeouts?
>> > 
>> > In current kernels (starting with 4.8) this timeout should clear within
a
>> > few seconds after boot.
>> > 
>> > In older kernels (pre 4.8), my KVM takes up to 90 seconds to reach that
>> > seeding point. I have heard that on IBM System Z this trigger point
>> > requires minutes to be reached.
>> 
>> I share the same concern as Lee wrt to introducing latency into the
>> existing iscsi-target login sequence.
>> 
>> Namely in the use-cases where a single node is supporting ~1K unique
>> iscsi-target IQNs, and fail-over + re-balancing of IQNs where 100s of
>> login attempts are expected to occur in parallel.
>> 
>> If environments exist that require non trivial amounts of time for RNG
>> seeding to be ready for iscsi-target usage, then enforcing this
>> requirement at iscsi login time can open up problems, especially when
>> iscsi host environments may be sensitive to login timeouts, I/O
>> timeouts, et al.
>> 
>> That said, I'd prefer to simply wait for RNG to be seeded at modprobe
>> iscsi_target_mod time, instead of trying to enforce randomness during
>> login.
>> 
>> This way, those of use who have distributed storage platforms can know
>> to avoid putting a node into a state to accept iscsi-target IQN export
>> migration, before modprobe iscsi_target_mod has successfully loaded and
>> RNG seeding has been confirmed.
>> 
>> WDYT..?
> 
> We may have a chicken and egg problem when the wait is at the modprobe time.

> 
> Assume the modprobe happens during initramfs time to get access to the root

> file system. In this case, you entire boot process will lock up for an 
> indefinite amount of time. The reason is that in order to obtain events 
> detected by the RNG, devices need to be initialized and working. Such 
> devices 
> commonly start working after the the root partition is mounted as it 
> contains 
> all drivers, all configuration information etc.
> 
> Note, during the development of my /dev/random implementation, I added the 
> getrandom-like blocking behavior to /dev/urandom (which is the equivalent to

> 
> Jason's patch except that it applies to user space). The boot process locked


I thought reads from urandom never block by definition. An older manual page
(man urandom) also says: "A  read  from  the  /dev/urandom device will not
block waiting for more entropy."

Regards,
Ulrich

> 
> up since systemd wanted data from /dev/urandom while it processed the 
> initramfs. As it did not get any, the boot process did not commence that 
> could 
> deliver new events to be picked up by the RNG.
> 
> As I do not have such an iscsi system, I cannot test Jason's patch. But 
> maybe 
> the mentioned chicken-and-egg problem I mentioned above is already visible 
> with the current patch as it will lead to a blocking of the mounting of the

> root partition in case the root partition is on an iscsi target.
> 
> Ciao
> Stephan
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to open-iscsi+unsubscribe@googlegroups.com.
> To post to this group, send email to open-iscsi@googlegroups.com.
> Visit this group at https://groups.google.com/group/open-iscsi.
> For more options, visit https://groups.google.com/d/optout.
Ulrich Windl July 5, 2017, 7:08 a.m. UTC | #11
>>> Jeffrey Walton <noloader@gmail.com> schrieb am 17.06.2017 um 16:23 in Nachricht
<CAH8yC8nHX2r9cfQ0gNeJAUrgSfAS8V16dVHv35BRnLn-YprZCg@mail.gmail.com>:

[...]
> But its not clear to me how to ensure uniqueness when its based on
> randomness from the generators.

Even with a perfect random generator non-unique values are possible (that's why it's random). It's unlikely, but it can happen. The question is whether the probability of non-unique values from /dev/urandom is any higher than that for values read from /dev/random. One _might_ be able to predict the values from /dev/urandom.

Regards,
Ulrich

> 
> Jeff
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to open-iscsi+unsubscribe@googlegroups.com.
> To post to this group, send email to open-iscsi@googlegroups.com.
> Visit this group at https://groups.google.com/group/open-iscsi.
> For more options, visit https://groups.google.com/d/optout.
Theodore Ts'o July 5, 2017, 12:35 p.m. UTC | #12
On Wed, Jul 05, 2017 at 09:03:43AM +0200, Ulrich Windl wrote:
> > Note, during the development of my /dev/random implementation, I added the 
> > getrandom-like blocking behavior to /dev/urandom (which is the equivalent to
> > Jason's patch except that it applies to user space). The boot process locked
> 
> I thought reads from urandom never block by definition. An older manual page
> (man urandom) also says: "A  read  from  the  /dev/urandom device will not
> block waiting for more entropy."

As I said in my original message, I *tried* this as an experiment.
Because lots of security-obsessed people were disputing my
intelligence, my judgement, and in some cases, my paternity becuase I
wouldn't change /dev/urandom not to block.

So I did the experiment so I could give them hard data about why we
couldn't go down that path.

> > up since systemd wanted data from /dev/urandom while it processed the 
> > initramfs. As it did not get any, the boot process did not commence that 
> > could 
> > deliver new events to be picked up by the RNG.

And indeed, making this change brick'ed at least one version of Ubuntu
and one version of CeroWRT, as reported by the kernel's 0-day testing
system.  As a result, this patch (which was always a proof of concept,
not anything I thought had any chance of going upstream), was dropped.

Since in the kernel, We Do Not Break Backwards Compatibility, this is
why we have a new interface --- getrandom(2) --- instead of changing
an existing interface.  (Well, there were multiple good reasons for
getrandom, but this was definitely one of them.)

	    	       	   	    - Ted
Paul Koning July 5, 2017, 1:16 p.m. UTC | #13
> On Jul 5, 2017, at 3:08 AM, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> wrote:
> 
>>>> Jeffrey Walton <noloader@gmail.com> schrieb am 17.06.2017 um 16:23 in Nachricht
> <CAH8yC8nHX2r9cfQ0gNeJAUrgSfAS8V16dVHv35BRnLn-YprZCg@mail.gmail.com>:
> 
> [...]
>> But its not clear to me how to ensure uniqueness when its based on
>> randomness from the generators.
> 
> Even with a perfect random generator non-unique values are possible (that's why it's random). It's unlikely, but it can happen. The question is whether the probability of non-unique values from /dev/urandom is any higher than that for values read from /dev/random. One _might_ be able to predict the values from /dev/urandom.

In the implementations I know, /dev/random and /dev/urandom are the same driver, the only difference is that when you read from /dev/random there's a check for the current entropy level.

If you haven't fed enough entropy yet to the driver since startup, and you read /dev/urandom, you get a value that isn't sufficiently secure.  

If you have a properly constructed RNG, as soon as it's been fed enough entropy it is secure (at least for the next 2^64 bits or so).  The notion of "using up entropy" is not meaningful for a good generator.   See Bruce Schneier's "Yarrow" paper for the details.

	paul
Theodore Ts'o July 5, 2017, 5:34 p.m. UTC | #14
On Wed, Jul 05, 2017 at 09:16:09AM -0400, Paul Koning wrote:
> 
> In the implementations I know, /dev/random and /dev/urandom are the
> same driver, the only difference is that when you read from
> /dev/random there's a check for the current entropy level.

It's in the same driver but /dev/random and /dev/urandom are different
beasts.  Pre-4.9 we use the same SHA-1 based generator, but we use
different state pools, and we periodically "catastrophically reseed"
(ala Yarrow) from the random pool to the urandom pool.  Getrandom(2)
uses the urandom pool.

In 4.9 and later kernels, /dev/urandom and getrandom(2) use a ChaCha20
based generator which provides for more speed.  We still use the SHA-1
pool for the random pool because it allows for a much larger "state
pool" to store entropy.

> If you have a properly constructed RNG, as soon as it's been fed
> enough entropy it is secure (at least for the next 2^64 bits or so).
> The notion of "using up entropy" is not meaningful for a good
> generator.  See Bruce Schneier's "Yarrow" paper for the details.

A lot of this depends on whether or not you trust your crypto
primitives or not.  The /dev/random driver was the very first OS-based
random generator, and back then, export restrictions were still a
thing (which is why we only used MD5 and SHA-1 as crypto primitives),
and our cryptoanalytic understanding of what makes for a good crypto
hash or encryption algorithm was quite limited.  So trying to
accumulate a large amount of entropy pool of entropy is a good thing,
because even if there was a minor weakness in the crypto hash (and for
a while we were using MD5), relying on the adversary not being able to
guess all of the environmental noise harvested by the kernel would
cover for a lot of sins.

Remember, in the kernel we have access to a large amount of
environmental noise, so it makes a lot of sense to take advantage of
that as much as possible.  So by having a huge state pool for the
/dev/random entropy pool, we can harvest and store as much of that
environmental noise as possible.  This buys us a large amount of
safety margin, which is good thing because somewhere there might be
some Linux 2.0 or Linux 2.2 based router sitting in someone's home
where /dev/random is using MD5.  Those ancient kernels are probably
riddled with zero-day security holes, but the safety margin of using a
large entropy pool is such that even though there are many known
attacks against MD5, I'm actually pretty confident that the large
state pool mitigates the weakness of MD5 as used by /dev/random and
/dev/urandom.  At the very least, it will be much easier for the NSA
to use some other zero-day to attack the said router with the ancient
kernel, well before they try to reverse engineer its /dev/urandom
output.  :-)

						- Ted
diff mbox

Patch

From 1adecf785526a2a96104767807140b9e1a9e2a27 Mon Sep 17 00:00:00 2001
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Mon, 5 Jun 2017 05:09:54 +0200
Subject: [PATCH] iscsi: ensure RNG is seeded before use

It's not safe to use weak random data here, especially for the challenge
response randomness. Since we're always in process context, it's safe to
simply wait until we have enough randomness to carry out the
authentication correctly.

While we're at it, we clean up a small memleak during an error
condition.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Chris Leech <cleech@redhat.com>
---
 drivers/target/iscsi/iscsi_target_auth.c  | 14 +++++++++++---
 drivers/target/iscsi/iscsi_target_login.c | 22 ++++++++++++++--------
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target_auth.c b/drivers/target/iscsi/iscsi_target_auth.c
index 903b667f8e01..f9bc8ec6fb6b 100644
--- a/drivers/target/iscsi/iscsi_target_auth.c
+++ b/drivers/target/iscsi/iscsi_target_auth.c
@@ -47,18 +47,21 @@  static void chap_binaryhex_to_asciihex(char *dst, char *src, int src_len)
 	}
 }
 
-static void chap_gen_challenge(
+static int chap_gen_challenge(
 	struct iscsi_conn *conn,
 	int caller,
 	char *c_str,
 	unsigned int *c_len)
 {
+	int ret;
 	unsigned char challenge_asciihex[CHAP_CHALLENGE_LENGTH * 2 + 1];
 	struct iscsi_chap *chap = conn->auth_protocol;
 
 	memset(challenge_asciihex, 0, CHAP_CHALLENGE_LENGTH * 2 + 1);
 
-	get_random_bytes(chap->challenge, CHAP_CHALLENGE_LENGTH);
+	ret = get_random_bytes_wait(chap->challenge, CHAP_CHALLENGE_LENGTH);
+	if (unlikely(ret))
+		return ret;
 	chap_binaryhex_to_asciihex(challenge_asciihex, chap->challenge,
 				CHAP_CHALLENGE_LENGTH);
 	/*
@@ -69,6 +72,7 @@  static void chap_gen_challenge(
 
 	pr_debug("[%s] Sending CHAP_C=0x%s\n\n", (caller) ? "server" : "client",
 			challenge_asciihex);
+	return 0;
 }
 
 static int chap_check_algorithm(const char *a_str)
@@ -143,6 +147,7 @@  static struct iscsi_chap *chap_server_open(
 	case CHAP_DIGEST_UNKNOWN:
 	default:
 		pr_err("Unsupported CHAP_A value\n");
+		kfree(conn->auth_protocol);
 		return NULL;
 	}
 
@@ -156,7 +161,10 @@  static struct iscsi_chap *chap_server_open(
 	/*
 	 * Generate Challenge.
 	 */
-	chap_gen_challenge(conn, 1, aic_str, aic_len);
+	if (chap_gen_challenge(conn, 1, aic_str, aic_len) < 0) {
+		kfree(conn->auth_protocol);
+		return NULL;
+	}
 
 	return chap;
 }
diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index 92b96b51d506..e9bdc8b86e7d 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -245,22 +245,26 @@  int iscsi_check_for_session_reinstatement(struct iscsi_conn *conn)
 	return 0;
 }
 
-static void iscsi_login_set_conn_values(
+static int iscsi_login_set_conn_values(
 	struct iscsi_session *sess,
 	struct iscsi_conn *conn,
 	__be16 cid)
 {
+	int ret;
 	conn->sess		= sess;
 	conn->cid		= be16_to_cpu(cid);
 	/*
 	 * Generate a random Status sequence number (statsn) for the new
 	 * iSCSI connection.
 	 */
-	get_random_bytes(&conn->stat_sn, sizeof(u32));
+	ret = get_random_bytes_wait(&conn->stat_sn, sizeof(u32));
+	if (unlikely(ret))
+		return ret;
 
 	mutex_lock(&auth_id_lock);
 	conn->auth_id		= iscsit_global->auth_id++;
 	mutex_unlock(&auth_id_lock);
+	return 0;
 }
 
 __printf(2, 3) int iscsi_change_param_sprintf(
@@ -306,7 +310,11 @@  static int iscsi_login_zero_tsih_s1(
 		return -ENOMEM;
 	}
 
-	iscsi_login_set_conn_values(sess, conn, pdu->cid);
+	ret = iscsi_login_set_conn_values(sess, conn, pdu->cid);
+	if (unlikely(ret)) {
+		kfree(sess);
+		return ret;
+	}
 	sess->init_task_tag	= pdu->itt;
 	memcpy(&sess->isid, pdu->isid, 6);
 	sess->exp_cmd_sn	= be32_to_cpu(pdu->cmdsn);
@@ -497,8 +505,7 @@  static int iscsi_login_non_zero_tsih_s1(
 {
 	struct iscsi_login_req *pdu = (struct iscsi_login_req *)buf;
 
-	iscsi_login_set_conn_values(NULL, conn, pdu->cid);
-	return 0;
+	return iscsi_login_set_conn_values(NULL, conn, pdu->cid);
 }
 
 /*
@@ -554,9 +561,8 @@  static int iscsi_login_non_zero_tsih_s2(
 		atomic_set(&sess->session_continuation, 1);
 	spin_unlock_bh(&sess->conn_lock);
 
-	iscsi_login_set_conn_values(sess, conn, pdu->cid);
-
-	if (iscsi_copy_param_list(&conn->param_list,
+	if (iscsi_login_set_conn_values(sess, conn, pdu->cid) < 0 ||
+	    iscsi_copy_param_list(&conn->param_list,
 			conn->tpg->param_list, 0) < 0) {
 		iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR,
 				ISCSI_LOGIN_STATUS_NO_RESOURCES);
-- 
2.13.0