Message ID | 20210716110428.9727-1-hare@suse.de (mailing list archive) |
---|---|
Headers | show |
Series | nvme: In-band authentication support | expand |
> Hi all, Hey Hannes, nice progress. This is definitely a step in the right direction. > recent updates to the NVMe spec have added definitions for in-band > authentication, and seeing that it provides some real benefit especially > for NVMe-TCP here's an attempt to implement it. Please call out the TP 8006 specifically so people can look into it. > Tricky bit here is that the specification orients itself on TLS 1.3, > but supports only the FFDHE groups. Which of course the kernel doesn't > support. I've been able to come up with a patch for this, but as this > is my first attempt to fix anything in the crypto area I would invite > people more familiar with these matters to have a look. Glad to see this turned out to be very simple! > Also note that this is just for in-band authentication. Secure concatenation > (ie starting TLS with the negotiated parameters) is not implemented; one would > need to update the kernel TLS implementation for this, which at this time is > beyond scope. TLS is an additional effort, as discussed, inband auth alone has merits and we should not lock it down to NVMe/TCP-TLS. > As usual, comments and reviews are welcome. Having another look into this now...
On Fri, 2021-07-16 at 13:04 +0200, Hannes Reinecke wrote: > Hi all, > > recent updates to the NVMe spec have added definitions for in-band > authentication, and seeing that it provides some real benefit especially > for NVMe-TCP here's an attempt to implement it. > > Tricky bit here is that the specification orients itself on TLS 1.3, > but supports only the FFDHE groups. Which of course the kernel doesn't > support. I've been able to come up with a patch for this, but as this > is my first attempt to fix anything in the crypto area I would invite > people more familiar with these matters to have a look. > > Also note that this is just for in-band authentication. Secure concatenation > (ie starting TLS with the negotiated parameters) is not implemented; one would > need to update the kernel TLS implementation for this, which at this time is > beyond scope. > > As usual, comments and reviews are welcome. Hi Hannes, could you please reference the specific standards that describe the NVMe authentication protocols? Thanks, Simo.
On 7/19/21 12:02 PM, Simo Sorce wrote: > On Fri, 2021-07-16 at 13:04 +0200, Hannes Reinecke wrote: >> Hi all, >> >> recent updates to the NVMe spec have added definitions for in-band >> authentication, and seeing that it provides some real benefit especially >> for NVMe-TCP here's an attempt to implement it. >> >> Tricky bit here is that the specification orients itself on TLS 1.3, >> but supports only the FFDHE groups. Which of course the kernel doesn't >> support. I've been able to come up with a patch for this, but as this >> is my first attempt to fix anything in the crypto area I would invite >> people more familiar with these matters to have a look. >> >> Also note that this is just for in-band authentication. Secure concatenation >> (ie starting TLS with the negotiated parameters) is not implemented; one would >> need to update the kernel TLS implementation for this, which at this time is >> beyond scope. >> >> As usual, comments and reviews are welcome. > > Hi Hannes, > could you please reference the specific standards that describe the > NVMe authentication protocols? > https://nvmexpress.org/wp-content/uploads/NVM-Express-Base-Specification-2_0-2021.06.02-Ratified-5.pdf Section '8.13 NVMe-over-Fabrics In-band authentication' Cheers, Hannes
Hi, Great to see those patches coming! After some review, they look to be very well done. Some comments/suggestions below. 1. I strongly recommend to implement DH exponentials reuse (g x mod p / g y mod p as well as g xy mod p) as specified in section 8.13.5.7 "DH-HMAC-CHAP Security Requirements". When I was working on TP 8006 I had a prototype that demonstrated that DH math has quite significant latency, something like (as far as I remember) 30ms for 4K group and few hundreds of ms for 8K group. For single connection it is not a big deal, but imagine AMD EPYC with 128 cores. Since all connections are created sequentially, even with 30 ms per connection time to complete full remote device connection would become 128*30 => almost 4 seconds. With 8K group it might be more than 10 seconds. Users are unlikely going to be happy with this, especially in cases, when connecting multiple of NVMe-oF devices is a part of a server or VM boot sequence. If DH exponential reuse implemented, for all subsequent connections the DH math is excluded, so authentication overhead becomes pretty much negligible. In my prototype I implemented DH exponential reuse as a simple per-host/target cache that keeps DH exponentials (including g xy mod p) for up to 10 seconds. Simple and sufficient. Another, might be ever more significant reason why DH exponential reuse is important is that without it x (or y on the host side) must always be randomly generated each time a new connection is established. Which means, for instance, for 8K groups for each connection 1KB of random bytes must be taken from the random pool. With 128 connections it is now 128KB. Quite a big pressure on the random pool that DH exponential reuse mostly avoids. Those are the 2 reasons why we added this DH exponential reuse sentence in the spec. In the original TP 8006 there was a small informative piece explaining reasonings behind that, but for some reasons it was removed from the final version. 2. What is the status of this code from perspective of stability in face of malicious host behavior? Seems implementation is carefully done, but, for instance, at the first look I was not able to find a code to clean up if host in not acting for too long in the middle of exchange. Other observation is that in nvmet_execute_auth_send() nvmet_check_transfer_len() does not check if tl size is reasonable, i.e., for instance, not 1GB. For sure, we don't want to allow remote hosts to hang or crash target. For instance, because of OOM conditions that happened, because malicious host asked target to allocate too much memory or open to many being authenticated connections in which the host is not going to reply in the middle of exchange. Asking, because don't want to go in my review too far ahead from the author ;) In this regard, it would be great if you add in your test application ability to perform authentication with random parameters and randomly stop responding. Overnight running of such test would give us good degree of confidence that it will always work as expected. Vlad On 7/16/21 2:04 PM, Hannes Reinecke wrote: > Hi all, > > recent updates to the NVMe spec have added definitions for in-band > authentication, and seeing that it provides some real benefit especially > for NVMe-TCP here's an attempt to implement it. > > Tricky bit here is that the specification orients itself on TLS 1.3, > but supports only the FFDHE groups. Which of course the kernel doesn't > support. I've been able to come up with a patch for this, but as this > is my first attempt to fix anything in the crypto area I would invite > people more familiar with these matters to have a look. > > Also note that this is just for in-band authentication. Secure concatenation > (ie starting TLS with the negotiated parameters) is not implemented; one would > need to update the kernel TLS implementation for this, which at this time is > beyond scope. > > As usual, comments and reviews are welcome. > > Hannes Reinecke (11): > crypto: add crypto_has_shash() > crypto: add crypto_has_kpp() > crypto/ffdhe: Finite Field DH Ephemeral Parameters > lib/base64: RFC4648-compliant base64 encoding > nvme: add definitions for NVMe In-Band authentication > nvme: Implement In-Band authentication > nvme-auth: augmented challenge support > nvmet: Parse fabrics commands on all queues > nvmet: Implement basic In-Band Authentication > nvmet-auth: implement support for augmented challenge > nvme: add non-standard ECDH and curve25517 algorithms > > crypto/Kconfig | 8 + > crypto/Makefile | 1 + > crypto/ffdhe_helper.c | 877 +++++++++++++++++ > crypto/kpp.c | 6 + > crypto/shash.c | 6 + > drivers/nvme/host/Kconfig | 11 + > drivers/nvme/host/Makefile | 1 + > drivers/nvme/host/auth.c | 1188 ++++++++++++++++++++++++ > drivers/nvme/host/auth.h | 23 + > drivers/nvme/host/core.c | 77 +- > drivers/nvme/host/fabrics.c | 65 +- > drivers/nvme/host/fabrics.h | 8 + > drivers/nvme/host/nvme.h | 15 + > drivers/nvme/host/trace.c | 32 + > drivers/nvme/target/Kconfig | 10 + > drivers/nvme/target/Makefile | 1 + > drivers/nvme/target/admin-cmd.c | 4 + > drivers/nvme/target/auth.c | 608 ++++++++++++ > drivers/nvme/target/configfs.c | 102 +- > drivers/nvme/target/core.c | 10 + > drivers/nvme/target/fabrics-cmd-auth.c | 472 ++++++++++ > drivers/nvme/target/fabrics-cmd.c | 30 +- > drivers/nvme/target/nvmet.h | 71 ++ > include/crypto/ffdhe.h | 24 + > include/crypto/hash.h | 2 + > include/crypto/kpp.h | 2 + > include/linux/base64.h | 16 + > include/linux/nvme.h | 187 +++- > lib/Makefile | 2 +- > lib/base64.c | 111 +++ > 30 files changed, 3961 insertions(+), 9 deletions(-) > create mode 100644 crypto/ffdhe_helper.c > create mode 100644 drivers/nvme/host/auth.c > create mode 100644 drivers/nvme/host/auth.h > create mode 100644 drivers/nvme/target/auth.c > create mode 100644 drivers/nvme/target/fabrics-cmd-auth.c > create mode 100644 include/crypto/ffdhe.h > create mode 100644 include/linux/base64.h > create mode 100644 lib/base64.c >
On 7/20/21 10:26 PM, Vladislav Bolkhovitin wrote: > Hi, > > Great to see those patches coming! After some review, they look to be > very well done. Some comments/suggestions below. > > 1. I strongly recommend to implement DH exponentials reuse (g x mod p / > g y mod p as well as g xy mod p) as specified in section 8.13.5.7 > "DH-HMAC-CHAP Security Requirements". When I was working on TP 8006 I > had a prototype that demonstrated that DH math has quite significant > latency, something like (as far as I remember) 30ms for 4K group and few > hundreds of ms for 8K group. For single connection it is not a big deal, > but imagine AMD EPYC with 128 cores. Since all connections are created > sequentially, even with 30 ms per connection time to complete full > remote device connection would become 128*30 => almost 4 seconds. With > 8K group it might be more than 10 seconds. Users are unlikely going to > be happy with this, especially in cases, when connecting multiple of > NVMe-oF devices is a part of a server or VM boot sequence. > Oh, indeed, I can confirm that. FFDHE calculations are quite time-consuming. But incidentally, ECDH and curve25519 are reasonably fast, so maybe there _is_ a value in having a TPAR asking for them to be specified, too ... > If DH exponential reuse implemented, for all subsequent connections the > DH math is excluded, so authentication overhead becomes pretty much > negligible. > > In my prototype I implemented DH exponential reuse as a simple > per-host/target cache that keeps DH exponentials (including g xy mod p) > for up to 10 seconds. Simple and sufficient. > Frankly, I hadn't looked at exponential reuse; this implementation really is just a first step to get feedback from people if this is a direction they want to go. > Another, might be ever more significant reason why DH exponential reuse > is important is that without it x (or y on the host side) must always be > randomly generated each time a new connection is established. Which > means, for instance, for 8K groups for each connection 1KB of random > bytes must be taken from the random pool. With 128 connections it is now > 128KB. Quite a big pressure on the random pool that DH exponential reuse > mostly avoids. > > Those are the 2 reasons why we added this DH exponential reuse sentence > in the spec. In the original TP 8006 there was a small informative piece > explaining reasonings behind that, but for some reasons it was removed > from the final version. > Thanks for the hint. I'll be adding exponential reuse to the code. > 2. What is the status of this code from perspective of stability in face > of malicious host behavior? Seems implementation is carefully done, but, > for instance, at the first look I was not able to find a code to clean > up if host in not acting for too long in the middle of exchange. Other > observation is that in nvmet_execute_auth_send() > nvmet_check_transfer_len() does not check if tl size is reasonable, > i.e., for instance, not 1GB. > That is true; exchange timeouts are missing. Will be adding them, of course. And haven't thought of checking for tl size overflows; will be adding them, too. > For sure, we don't want to allow remote hosts to hang or crash target. > For instance, because of OOM conditions that happened, because malicious > host asked target to allocate too much memory or open to many being > authenticated connections in which the host is not going to reply in the > middle of exchange. > This is something I'll need to look at, anyway. What we do not want is a userspace application chipping in and send a 'negotiate' command without any subsequent steps, thereby invalidating the existing authentication. > Asking, because don't want to go in my review too far ahead from the > author ;) > > In this regard, it would be great if you add in your test application > ability to perform authentication with random parameters and randomly > stop responding. Overnight running of such test would give us good > degree of confidence that it will always work as expected. > That indeed would be good; let me think on how something like that can be implemented. Cheers, Hannes
On 7/21/21 9:06 AM, Hannes Reinecke wrote: > On 7/20/21 10:26 PM, Vladislav Bolkhovitin wrote: >> Hi, >> >> Great to see those patches coming! After some review, they look to be >> very well done. Some comments/suggestions below. >> >> 1. I strongly recommend to implement DH exponentials reuse (g x mod p / >> g y mod p as well as g xy mod p) as specified in section 8.13.5.7 >> "DH-HMAC-CHAP Security Requirements". When I was working on TP 8006 I >> had a prototype that demonstrated that DH math has quite significant >> latency, something like (as far as I remember) 30ms for 4K group and few >> hundreds of ms for 8K group. For single connection it is not a big deal, >> but imagine AMD EPYC with 128 cores. Since all connections are created >> sequentially, even with 30 ms per connection time to complete full >> remote device connection would become 128*30 => almost 4 seconds. With >> 8K group it might be more than 10 seconds. Users are unlikely going to >> be happy with this, especially in cases, when connecting multiple of >> NVMe-oF devices is a part of a server or VM boot sequence. >> > Oh, indeed, I can confirm that. FFDHE calculations are quite time-consuming. > But incidentally, ECDH and curve25519 are reasonably fast, Yes, EC calculations are very fast, this is why EC cryptography is gaining more and more popularity. > so maybe > there _is_ a value in having a TPAR asking for them to be specified, too ... There's too much politics and procedures involved here. Even in the current scope it took more, than 2 years to get the spec officially done (I started proposing it early 2018). Maybe, in future, if someone comes in the the committee with the corresponding proposal and value justification. Although, frankly speaking, with DH exponentials reuse I personally don't see much value in ECDH in this application. Maybe, only for very small embedded devices with really limited computational capabilities. >> If DH exponential reuse implemented, for all subsequent connections the >> DH math is excluded, so authentication overhead becomes pretty much >> negligible. >> >> In my prototype I implemented DH exponential reuse as a simple >> per-host/target cache that keeps DH exponentials (including g xy mod p) >> for up to 10 seconds. Simple and sufficient. >> > > Frankly, I hadn't looked at exponential reuse; this implementation > really is just a first step to get feedback from people if this is a > direction they want to go. Sure, I understand. >> Another, might be ever more significant reason why DH exponential reuse >> is important is that without it x (or y on the host side) must always be >> randomly generated each time a new connection is established. Which >> means, for instance, for 8K groups for each connection 1KB of random >> bytes must be taken from the random pool. With 128 connections it is now >> 128KB. Quite a big pressure on the random pool that DH exponential reuse >> mostly avoids. >> >> Those are the 2 reasons why we added this DH exponential reuse sentence >> in the spec. In the original TP 8006 there was a small informative piece >> explaining reasonings behind that, but for some reasons it was removed >> from the final version. >> > > Thanks for the hint. I'll be adding exponential reuse to the code. Yes, please. Otherwise, people might start talking that Linux NVMe-oF authentication is too bad and slow. Vlad
Another comment is that better to perform CRC check of dhchap_secret and generation of dhchap_key right where the secret was specified (e.g., nvmet_auth_set_host_key() on the target side). No need to do it every time for every connection and by that increase authentication latency. As I wrote in the other comment, it might be 128 or more connections established during connecting to a single NVMe-oF device. Vlad
On 7/23/21 10:02 PM, Vladislav Bolkhovitin wrote: > Another comment is that better to perform CRC check of dhchap_secret and > generation of dhchap_key right where the secret was specified (e.g., > nvmet_auth_set_host_key() on the target side). > > No need to do it every time for every connection and by that increase > authentication latency. As I wrote in the other comment, it might be 128 > or more connections established during connecting to a single NVMe-oF > device. > And this is something I did deliberately. The primary issue here is that the user might want/need to change the PSK for re-authentication. But for that he might need to check what the original/current PSK is, so I think we need to keep the original PSK as passed in from the user. And I found it a waste of space to store the decoded secret in addition to the original one, seeing that it can be derived from the original one. But your argument about the many connections required for a single NVMe association is certainly true, to it would make sense to keep the decode one around, too, just to speed up computation. Will be updating the patchset. Cheers, Hannes