mbox series

[RFC,0/2] dm-crypt support for per-sector NVMe metadata

Message ID f85e3824-5545-f541-c96d-4352585288a@redhat.com (mailing list archive)
Headers show
Series dm-crypt support for per-sector NVMe metadata | expand

Message

Mikulas Patocka May 15, 2024, 1:27 p.m. UTC
Hi

Some NVMe devices may be formatted with extra 64 bytes of metadata per 
sector.

Here I'm submitting for review dm-crypt patches that make it possible to 
use per-sector metadata for authenticated encryption. With these patches, 
dm-crypt can run directly on the top of a NVMe device, without using 
dm-integrity. These patches increase write throughput twice, because there 
is no write to the dm-integrity journal.

An example how to use it (so far, there is no support in the userspace 
cryptsetup tool):

# nvme format /dev/nvme1 -n 1 -lbaf=4
# dmsetup create cr --table '0 1048576 crypt 
capi:authenc(hmac(sha256),cbc(aes))-essiv:sha256 
01b11af6b55f76424fd53fb66667c301466b2eeaf0f39fd36d26e7fc4f52ade2de4228e996f5ae2fe817ce178e77079d28e4baaebffbcd3e16ae4f36ef217298 
0 /dev/nvme1n1 0 2 integrity:32:aead sector_size:4096'

Please review it - I'd like to know whether detecting the presence of 
per-sector metadata in crypt_integrity_ctr is correct whether it should be 
done differently.

Mikulas

Comments

Eric Wheeler May 27, 2024, 10:12 p.m. UTC | #1
On Wed, 15 May 2024, Mikulas Patocka wrote:
> Hi
> 
> Some NVMe devices may be formatted with extra 64 bytes of metadata per 
> sector.
> 
> Here I'm submitting for review dm-crypt patches that make it possible to 
> use per-sector metadata for authenticated encryption. With these patches, 
> dm-crypt can run directly on the top of a NVMe device, without using 
> dm-integrity. These patches increase write throughput twice, because there 
> is no write to the dm-integrity journal.
> 
> An example how to use it (so far, there is no support in the userspace 
> cryptsetup tool):
> 
> # nvme format /dev/nvme1 -n 1 -lbaf=4
> # dmsetup create cr --table '0 1048576 crypt 
> capi:authenc(hmac(sha256),cbc(aes))-essiv:sha256 
> 01b11af6b55f76424fd53fb66667c301466b2eeaf0f39fd36d26e7fc4f52ade2de4228e996f5ae2fe817ce178e77079d28e4baaebffbcd3e16ae4f36ef217298 
> 0 /dev/nvme1n1 0 2 integrity:32:aead sector_size:4096'

Thats really an amazing feature, and I think your implementation is simple 
and elegant.  Somehow reminds me of 520/528-byte sectors that big 
commercial filers use, but in a way the Linux could use.

Questions:

- I see you are using 32-bytes of AEAD data (out of 64).  Is AEAD always 
  32-bytes, or can it vary by crypto mechanism?

- What drive are you using? I am curious what your `nvme id-ns` output 
  looks like. Do you have 64 in the `ms` value?

        # nvme id-ns /dev/nvme0n1 | grep lbaf
        nlbaf   : 0
        nulbaf  : 0
        lbaf  0 : ms:0   lbads:9  rp:0 (in use)
                     ^         ^512b

--
Eric Wheeler



> 
> Please review it - I'd like to know whether detecting the presence of 
> per-sector metadata in crypt_integrity_ctr is correct whether it should be 
> done differently.
> 
> Mikulas
> 
> 
>
Milan Broz May 28, 2024, 7:25 a.m. UTC | #2
On 5/28/24 12:12 AM, Eric Wheeler wrote:
> On Wed, 15 May 2024, Mikulas Patocka wrote:
>> Hi
>>
>> Some NVMe devices may be formatted with extra 64 bytes of metadata per
>> sector.
>>
>> Here I'm submitting for review dm-crypt patches that make it possible to
>> use per-sector metadata for authenticated encryption. With these patches,
>> dm-crypt can run directly on the top of a NVMe device, without using
>> dm-integrity. These patches increase write throughput twice, because there
>> is no write to the dm-integrity journal.
>>
>> An example how to use it (so far, there is no support in the userspace
>> cryptsetup tool):
>>
>> # nvme format /dev/nvme1 -n 1 -lbaf=4
>> # dmsetup create cr --table '0 1048576 crypt
>> capi:authenc(hmac(sha256),cbc(aes))-essiv:sha256
>> 01b11af6b55f76424fd53fb66667c301466b2eeaf0f39fd36d26e7fc4f52ade2de4228e996f5ae2fe817ce178e77079d28e4baaebffbcd3e16ae4f36ef217298
>> 0 /dev/nvme1n1 0 2 integrity:32:aead sector_size:4096'
> 
> Thats really an amazing feature, and I think your implementation is simple
> and elegant.  Somehow reminds me of 520/528-byte sectors that big
> commercial filers use, but in a way the Linux could use.
> 
> Questions:
> 
> - I see you are using 32-bytes of AEAD data (out of 64).  Is AEAD always
>    32-bytes, or can it vary by crypto mechanism?

Hi Eric,

I'll try to answer this question as this is where we headed with dm-integrity+dm-crypt
since the beginning - replace it with HW and atomic sector+metadata handling once
suitable HW becomes available.

Currently, dm-integrity allocates exact space for any AEAD you want to construct
(cipher-xts/hctr2 + hmac) or for native AEAD (my favourite is AEGIS here).

So it depends on configuration, the only difference to dm-integrity is that HW allocates
fixed 64 bytes so that crypto can use up to this space, but it should be completely
configurable in dm-crypt. IOW real used space can vary by crypto mechanism.

Definitely, it is now enough for real AEAD compared to legacy 512+8 DIF :)

Also, it opens a way to store something more (sector context) in metadata,
but that's an idea for the future (usable in fs encryption as well, I guess).


> - What drive are you using? I am curious what your `nvme id-ns` output
>    looks like. Do you have 64 in the `ms` value?
> 
>          # nvme id-ns /dev/nvme0n1 | grep lbaf
>          nlbaf   : 0
>          nulbaf  : 0
>          lbaf  0 : ms:0   lbads:9  rp:0 (in use)
>                       ^         ^512b

This is the major issue still - I think there are only enterprisey NVMe drives that
can do this.

Milan
Mikulas Patocka May 28, 2024, 11:16 a.m. UTC | #3
On Mon, 27 May 2024, Eric Wheeler wrote:

> On Wed, 15 May 2024, Mikulas Patocka wrote:
> > Hi
> > 
> > Some NVMe devices may be formatted with extra 64 bytes of metadata per 
> > sector.
> > 
> > Here I'm submitting for review dm-crypt patches that make it possible to 
> > use per-sector metadata for authenticated encryption. With these patches, 
> > dm-crypt can run directly on the top of a NVMe device, without using 
> > dm-integrity. These patches increase write throughput twice, because there 
> > is no write to the dm-integrity journal.
> > 
> > An example how to use it (so far, there is no support in the userspace 
> > cryptsetup tool):
> > 
> > # nvme format /dev/nvme1 -n 1 -lbaf=4
> > # dmsetup create cr --table '0 1048576 crypt 
> > capi:authenc(hmac(sha256),cbc(aes))-essiv:sha256 
> > 01b11af6b55f76424fd53fb66667c301466b2eeaf0f39fd36d26e7fc4f52ade2de4228e996f5ae2fe817ce178e77079d28e4baaebffbcd3e16ae4f36ef217298 
> > 0 /dev/nvme1n1 0 2 integrity:32:aead sector_size:4096'
> 
> Thats really an amazing feature, and I think your implementation is simple 
> and elegant.  Somehow reminds me of 520/528-byte sectors that big 
> commercial filers use, but in a way the Linux could use.
> 
> Questions:
> 
> - I see you are using 32-bytes of AEAD data (out of 64).  Is AEAD always 
>   32-bytes, or can it vary by crypto mechanism?

It varies. I.e. if you use hmac(sha512), full 64 bytes will be used.

> - What drive are you using?

Western Digital SN840

WUS4BA119DSP3X3

> I am curious what your `nvme id-ns` output 
>   looks like. Do you have 64 in the `ms` value?
> 
>         # nvme id-ns /dev/nvme0n1 | grep lbaf
>         nlbaf   : 0
>         nulbaf  : 0
>         lbaf  0 : ms:0   lbads:9  rp:0 (in use)
>                      ^         ^512b

Yes, I have this:
lbaf  0 : ms:0   lbads:9  rp:0
lbaf  1 : ms:8   lbads:9  rp:0
lbaf  2 : ms:0   lbads:12 rp:0
lbaf  3 : ms:8   lbads:12 rp:0
lbaf  4 : ms:64  lbads:12 rp:0 (in use)

Mikulas

> --
> Eric Wheeler
Eric Wheeler May 28, 2024, 11:55 p.m. UTC | #4
On Tue, 28 May 2024, Milan Broz wrote:
> On 5/28/24 12:12 AM, Eric Wheeler wrote:
> > On Wed, 15 May 2024, Mikulas Patocka wrote:
> >> Hi
> >>
> >> Some NVMe devices may be formatted with extra 64 bytes of metadata per
> >> sector.
> >>
> >> Here I'm submitting for review dm-crypt patches that make it possible to
> >> use per-sector metadata for authenticated encryption. With these patches,
> >> dm-crypt can run directly on the top of a NVMe device, without using
> >> dm-integrity. These patches increase write throughput twice, because there
> >> is no write to the dm-integrity journal.
> >>
> >> An example how to use it (so far, there is no support in the userspace
> >> cryptsetup tool):
> >>
> >> # nvme format /dev/nvme1 -n 1 -lbaf=4
> >> # dmsetup create cr --table '0 1048576 crypt
> >> capi:authenc(hmac(sha256),cbc(aes))-essiv:sha256
> >> 01b11af6b55f76424fd53fb66667c301466b2eeaf0f39fd36d26e7fc4f52ade2de4228e996f5ae2fe817ce178e77079d28e4baaebffbcd3e16ae4f36ef217298
> >> 0 /dev/nvme1n1 0 2 integrity:32:aead sector_size:4096'
> > 
> > Thats really an amazing feature, and I think your implementation is simple
> > and elegant.  Somehow reminds me of 520/528-byte sectors that big
> > commercial filers use, but in a way the Linux could use.
> > 
> > Questions:
> > 
> > - I see you are using 32-bytes of AEAD data (out of 64).  Is AEAD always
> >    32-bytes, or can it vary by crypto mechanism?
> 
> Hi Eric,
> 
> I'll try to answer this question as this is where we headed with 
> dm-integrity+dm-crypt since the beginning - replace it with HW and 
> atomic sector+metadata handling once suitable HW becomes available.
> 
> Currently, dm-integrity allocates exact space for any AEAD you want to 
> construct (cipher-xts/hctr2 + hmac) or for native AEAD (my favourite is 
> AEGIS here).

Awesome.
 
> So it depends on configuration, the only difference to dm-integrity is 
> that HW allocates fixed 64 bytes so that crypto can use up to this 
> space, but it should be completely configurable in dm-crypt. IOW real 
> used space can vary by crypto mechanism.
> 
> Definitely, it is now enough for real AEAD compared to legacy 512+8 DIF :)
>
> Also, it opens a way to store something more (sector context) in metadata,
> but that's an idea for the future (usable in fs encryption as well, I guess).

Good idea, you could modify the SCSI layer (similarly as for NVMe meta) so 
that bio integrity payload data could be packed at the end of a sector for 
drives that have DIF room. This would make it possible to use 520/528-byte 
and 4160/4224-byte-sectored SAS drives in Linux, for the first time ever. 

Newer SAS drives like the Xeos X22 with can support the following sector 
sizes:
	 512 +  0
	 512 +  8
	 512 + 16
	4096 +  0
	4096 + 64
	4096 +128

> > - What drive are you using? I am curious what your `nvme id-ns` output
> >    looks like. Do you have 64 in the `ms` value?
> > 
> >          # nvme id-ns /dev/nvme0n1 | grep lbaf
> >          nlbaf   : 0
> >          nulbaf  : 0
> >          lbaf  0 : ms:0   lbads:9  rp:0 (in use)
> >                       ^         ^512b
> 
> This is the major issue still - I think there are only enterprisey NVMe drives
> that can do this.

For now that is fine, we only use enterprise drives anyway, and it will be 
great to use the integrity that the drives support natively. Coupled with 
MDRAID, this solves hidden bitrot quite nicely at that block layer ... and 
it may trickle down into desktop drives eventually.

-Eric

> 
> Milan
> 
> 
>