mbox series

[RFC,0/4] fscrypt: add support for per-extent encryption

Message ID cover.1694738282.git.josef@toxicpanda.com (mailing list archive)
Headers show
Series fscrypt: add support for per-extent encryption | expand

Message

Josef Bacik Sept. 15, 2023, 12:47 a.m. UTC
Hello,

This is meant as a replacement for the last set of patches Sweet Tea sent [1].
This is an attempt to find a different path forward.  Strip down everything to
the basics.  Essentially all we appear to need is a nonce, and then we can use
the inode context to derive per-extent keys.

I'm sending this as an RFC to see if this is a better direction to try and make
some headway on this project.  The btrfs side doesn't change too much, the code
just needs to be adjusted to use the new helpers for the extent contexts.  I
have this work mostly complete, but I'm afraid I won't have it ready for another
day or two and I want to get feedback on this ASAP before I burn too much time
on it.

Additionally there is a callback I've put in the inline block crypto stuff that
we need in order to handle the checksumming.  I made my best guess here as to
what would be the easiest and simplest way to acheive what we need, but I'm open
to suggestions here.

The other note is I've disabled all of the policy variations other than default
v2 policies if you enable extent encryption.  This is for simplicity sake.  We
could probably make most of it work, but reflink is basically impossible for v1
with direct key, and is problematic for the lblk related options.  It appears
this is fine, as those other modes are for specific use cases and the vast
majority of normal users are encouraged to use normal v2 policies anyway.

This stripped down version gives us most of what we want, we can reflink between
different inodes that have the same policy.  We lose the ability to mix
differently encrypted extents in the same inode, but this is an acceptable
limitation for now.

This has only been compile tested, and as I've said I haven't wired it
completely up into btrfs yet.  But this is based on a rough wire up and appears
to give us everything we need.  The btrfs portion of Sweet Teas patches are
basically untouched except where we use these helpers to deal with the extent
contexts.  Thanks,

Josef

[1] https://lore.kernel.org/linux-fscrypt/cover.1693630890.git.sweettea-kernel@dorminy.me/

Josef Bacik (4):
  fscrypt: rename fscrypt_info => fscrypt_inode_info
  fscrypt: add per-extent encryption support
  fscrypt: disable all but standard v2 policies for extent encryption
  blk-crypto: add a process bio callback

 block/blk-crypto-fallback.c |  18 ++++
 block/blk-crypto-profile.c  |   2 +
 block/blk-crypto.c          |   6 +-
 fs/crypto/crypto.c          |  23 +++--
 fs/crypto/fname.c           |   6 +-
 fs/crypto/fscrypt_private.h |  78 ++++++++++++----
 fs/crypto/hooks.c           |   2 +-
 fs/crypto/inline_crypt.c    |  50 +++++++++--
 fs/crypto/keyring.c         |   4 +-
 fs/crypto/keysetup.c        | 174 ++++++++++++++++++++++++++++++++----
 fs/crypto/keysetup_v1.c     |  14 +--
 fs/crypto/policy.c          |  45 ++++++++--
 include/linux/blk-crypto.h  |   9 +-
 include/linux/fs.h          |   4 +-
 include/linux/fscrypt.h     |  41 ++++++++-
 15 files changed, 400 insertions(+), 76 deletions(-)

Comments

Eric Biggers Sept. 15, 2023, 6:48 a.m. UTC | #1
On Thu, Sep 14, 2023 at 08:47:41PM -0400, Josef Bacik wrote:
> Hello,
> 
> This is meant as a replacement for the last set of patches Sweet Tea sent [1].
> This is an attempt to find a different path forward.  Strip down everything to
> the basics.  Essentially all we appear to need is a nonce, and then we can use
> the inode context to derive per-extent keys.
> 
> I'm sending this as an RFC to see if this is a better direction to try and make
> some headway on this project.  The btrfs side doesn't change too much, the code
> just needs to be adjusted to use the new helpers for the extent contexts.  I
> have this work mostly complete, but I'm afraid I won't have it ready for another
> day or two and I want to get feedback on this ASAP before I burn too much time
> on it.
> 
> Additionally there is a callback I've put in the inline block crypto stuff that
> we need in order to handle the checksumming.  I made my best guess here as to
> what would be the easiest and simplest way to acheive what we need, but I'm open
> to suggestions here.
> 
> The other note is I've disabled all of the policy variations other than default
> v2 policies if you enable extent encryption.  This is for simplicity sake.  We
> could probably make most of it work, but reflink is basically impossible for v1
> with direct key, and is problematic for the lblk related options.  It appears
> this is fine, as those other modes are for specific use cases and the vast
> majority of normal users are encouraged to use normal v2 policies anyway.
> 
> This stripped down version gives us most of what we want, we can reflink between
> different inodes that have the same policy.  We lose the ability to mix
> differently encrypted extents in the same inode, but this is an acceptable
> limitation for now.
> 
> This has only been compile tested, and as I've said I haven't wired it
> completely up into btrfs yet.  But this is based on a rough wire up and appears
> to give us everything we need.  The btrfs portion of Sweet Teas patches are
> basically untouched except where we use these helpers to deal with the extent
> contexts.  Thanks,
> 
> Josef
> 
> [1] https://lore.kernel.org/linux-fscrypt/cover.1693630890.git.sweettea-kernel@dorminy.me/
> 
> Josef Bacik (4):
>   fscrypt: rename fscrypt_info => fscrypt_inode_info
>   fscrypt: add per-extent encryption support
>   fscrypt: disable all but standard v2 policies for extent encryption
>   blk-crypto: add a process bio callback
> 
>  block/blk-crypto-fallback.c |  18 ++++
>  block/blk-crypto-profile.c  |   2 +
>  block/blk-crypto.c          |   6 +-
>  fs/crypto/crypto.c          |  23 +++--
>  fs/crypto/fname.c           |   6 +-
>  fs/crypto/fscrypt_private.h |  78 ++++++++++++----
>  fs/crypto/hooks.c           |   2 +-
>  fs/crypto/inline_crypt.c    |  50 +++++++++--
>  fs/crypto/keyring.c         |   4 +-
>  fs/crypto/keysetup.c        | 174 ++++++++++++++++++++++++++++++++----
>  fs/crypto/keysetup_v1.c     |  14 +--
>  fs/crypto/policy.c          |  45 ++++++++--
>  include/linux/blk-crypto.h  |   9 +-
>  include/linux/fs.h          |   4 +-
>  include/linux/fscrypt.h     |  41 ++++++++-
>  15 files changed, 400 insertions(+), 76 deletions(-)

Thanks Josef!  At a high level this looks good to me.  It's much simpler.  I
guess my main question is "what is missing" (besides the obvious things like
updating the documentation and polishing code comments).  I see you got rid of a
lot of the complexity in Sweet Tea's patchset, which is great as I think a lot
of it was unnecessary as I've mentioned.  But maybe something got overlooked?
I'm mainly wondering about the patches like "fscrypt: allow asynchronous info
freeing" that were a bit puzzling but have now gone away.

Not supporting v1 encryption policies is the right call, I think.  xfstests will
need to be updated to not assume that v1 is always supported, but that's
something I've been thinking about doing anyway.

The patch that adds support for checksumming the on-disk data is new.  I see why
it's needed.  I suppose that's just been overlooked until now?  It's definitely
correct that you need to checksum the ciphertext, not the plaintext.  Otherwise
the checksums leak information about the plaintext.

Did you consider the idea I mentioned at the end of
https://lore.kernel.org/r/20230907055233.GB37146@sol.localdomain where we store
a full fscrypt_context per extent, but for now validate that it matches the
inode's context (minus the nonce) and only support that case?

I guess the reasons to do that would be (1) futureproofing, (2) error checking
to catch any bugs where an extent might be accessed inconsistently, and (3)
making extents "standalone" so that they can be decrypted by anything that
iterates through the extents only (e.g. btrfs scrub as mentioned by Sweet Tea;
though, how will scrub even have access to the encryption keys?).  I don't have
a great sense of how strong these reasons actually are, so any thoughts would be
appreciated.  If just the nonce is really all that's needed, then that's fine
too.  The point is, the part I was concerned about wasn't really whether the key
identifier and encryption settings get stored per extent, but rather whether we
actually support the case where these differ from the inode's.

(And by "the inode" I really mean "the inode that owns the extent cache entry
the extent is being accessed through".  It's the case that when an extent is
shared by multiple inodes, it gets a cache entry for each one, right?)

- Eric
Josef Bacik Sept. 15, 2023, 1:28 p.m. UTC | #2
On Thu, Sep 14, 2023 at 11:48:16PM -0700, Eric Biggers wrote:
> On Thu, Sep 14, 2023 at 08:47:41PM -0400, Josef Bacik wrote:
> > Hello,
> > 
> > This is meant as a replacement for the last set of patches Sweet Tea sent [1].
> > This is an attempt to find a different path forward.  Strip down everything to
> > the basics.  Essentially all we appear to need is a nonce, and then we can use
> > the inode context to derive per-extent keys.
> > 
> > I'm sending this as an RFC to see if this is a better direction to try and make
> > some headway on this project.  The btrfs side doesn't change too much, the code
> > just needs to be adjusted to use the new helpers for the extent contexts.  I
> > have this work mostly complete, but I'm afraid I won't have it ready for another
> > day or two and I want to get feedback on this ASAP before I burn too much time
> > on it.
> > 
> > Additionally there is a callback I've put in the inline block crypto stuff that
> > we need in order to handle the checksumming.  I made my best guess here as to
> > what would be the easiest and simplest way to acheive what we need, but I'm open
> > to suggestions here.
> > 
> > The other note is I've disabled all of the policy variations other than default
> > v2 policies if you enable extent encryption.  This is for simplicity sake.  We
> > could probably make most of it work, but reflink is basically impossible for v1
> > with direct key, and is problematic for the lblk related options.  It appears
> > this is fine, as those other modes are for specific use cases and the vast
> > majority of normal users are encouraged to use normal v2 policies anyway.
> > 
> > This stripped down version gives us most of what we want, we can reflink between
> > different inodes that have the same policy.  We lose the ability to mix
> > differently encrypted extents in the same inode, but this is an acceptable
> > limitation for now.
> > 
> > This has only been compile tested, and as I've said I haven't wired it
> > completely up into btrfs yet.  But this is based on a rough wire up and appears
> > to give us everything we need.  The btrfs portion of Sweet Teas patches are
> > basically untouched except where we use these helpers to deal with the extent
> > contexts.  Thanks,
> > 
> > Josef
> > 
> > [1] https://lore.kernel.org/linux-fscrypt/cover.1693630890.git.sweettea-kernel@dorminy.me/
> > 
> > Josef Bacik (4):
> >   fscrypt: rename fscrypt_info => fscrypt_inode_info
> >   fscrypt: add per-extent encryption support
> >   fscrypt: disable all but standard v2 policies for extent encryption
> >   blk-crypto: add a process bio callback
> > 
> >  block/blk-crypto-fallback.c |  18 ++++
> >  block/blk-crypto-profile.c  |   2 +
> >  block/blk-crypto.c          |   6 +-
> >  fs/crypto/crypto.c          |  23 +++--
> >  fs/crypto/fname.c           |   6 +-
> >  fs/crypto/fscrypt_private.h |  78 ++++++++++++----
> >  fs/crypto/hooks.c           |   2 +-
> >  fs/crypto/inline_crypt.c    |  50 +++++++++--
> >  fs/crypto/keyring.c         |   4 +-
> >  fs/crypto/keysetup.c        | 174 ++++++++++++++++++++++++++++++++----
> >  fs/crypto/keysetup_v1.c     |  14 +--
> >  fs/crypto/policy.c          |  45 ++++++++--
> >  include/linux/blk-crypto.h  |   9 +-
> >  include/linux/fs.h          |   4 +-
> >  include/linux/fscrypt.h     |  41 ++++++++-
> >  15 files changed, 400 insertions(+), 76 deletions(-)
> 
> Thanks Josef!  At a high level this looks good to me.  It's much simpler.  I
> guess my main question is "what is missing" (besides the obvious things like
> updating the documentation and polishing code comments).  I see you got rid of a
> lot of the complexity in Sweet Tea's patchset, which is great as I think a lot
> of it was unnecessary as I've mentioned.  But maybe something got overlooked?
> I'm mainly wondering about the patches like "fscrypt: allow asynchronous info
> freeing" that were a bit puzzling but have now gone away.
> 

I'm going to fix this in a different way internally in btrfs.  There's only once
place where we can't drop the lock, so I plan to just collate free'd EM's and
free them in bulk after we drop the lock.  This *may* not fix the problem, I
have to wait for lockdep to tell me I'm stupid and I missed some other
dependency, but if that's the case I'll just async free our EM's, that way I can
synchronize dropping the objects at inode drop time to avoid unpleasant timing
issues.

> Not supporting v1 encryption policies is the right call, I think.  xfstests will
> need to be updated to not assume that v1 is always supported, but that's
> something I've been thinking about doing anyway.
> 
> The patch that adds support for checksumming the on-disk data is new.  I see why
> it's needed.  I suppose that's just been overlooked until now?  It's definitely
> correct that you need to checksum the ciphertext, not the plaintext.  Otherwise
> the checksums leak information about the plaintext.
> 

I think Sweet Tea was leaving this as a followup exercise, but I'd rather have
everything working out of the box, and since my patchset was a lot simpler I
figured I'd give you more opportunities to yell at me for something.

> Did you consider the idea I mentioned at the end of
> https://lore.kernel.org/r/20230907055233.GB37146@sol.localdomain where we store
> a full fscrypt_context per extent, but for now validate that it matches the
> inode's context (minus the nonce) and only support that case?
> 

I missed this bit of feedback, I like it a lot actually because if we do decide
later to tackle key changing it would be nice if existing file systems had
everything they needed.  I think I'll still keep the slimmed down extent context
and just throw the master key in there and the encrytion type, and then do as
you say and validate it matches the inode.

> I guess the reasons to do that would be (1) futureproofing, (2) error checking
> to catch any bugs where an extent might be accessed inconsistently, and (3)
> making extents "standalone" so that they can be decrypted by anything that
> iterates through the extents only (e.g. btrfs scrub as mentioned by Sweet Tea;
> though, how will scrub even have access to the encryption keys?).  I don't have
> a great sense of how strong these reasons actually are, so any thoughts would be
> appreciated.  If just the nonce is really all that's needed, then that's fine
> too.  The point is, the part I was concerned about wasn't really whether the key
> identifier and encryption settings get stored per extent, but rather whether we
> actually support the case where these differ from the inode's.

Scrub just needs to read the raw bytes and validate the checksums.  This is my
other motivation for doing the checksum thing now, I want to make sure things
like scrub work out of the box since that's one of the "tricky" parts when it
comes to having data it can't actually derypt.

The key identifier is mostly around being able to simply support different keys
in the same inode, primarly for re-keying.  There is some talk of people wanting
to have differently encrypted subvolumes and reflink between them, and having
this information per-extent would make that a lot easier.  For now I've told
those people to kick rocks, but it'll be nice to be setup to add this support
later on.

> 
> (And by "the inode" I really mean "the inode that owns the extent cache entry
> the extent is being accessed through".  It's the case that when an extent is
> shared by multiple inodes, it gets a cache entry for each one, right?)
> 

Yup the cache entry gets loaded from the extent itself, so we'll have duplicate
cache entries since they're tied to the inode.  It's the same for our extent
maps as well, since you could at any point overwrite part of it and need to
update the in-memory mapping.

Thanks for the quick feedback, I'm going to finish making the appropriate
changes to the btrfs patches, make sure everything works.  I'll clean up these
patches and make the changes you suggested, and I'll update the documentation.
It'll probably be early next week when I send again, I'll do the work for
xfstests so everything works in this v2 only world.

Josef