mbox series

[v3,00/16] fscrypt: add extent encryption

Message ID cover.1691505882.git.sweettea-kernel@dorminy.me (mailing list archive)
Headers show
Series fscrypt: add extent encryption | expand

Message

Sweet Tea Dorminy Aug. 8, 2023, 5:08 p.m. UTC
This changeset adds extent-based data encryption to fscrypt.
Some filesystems need to encrypt data based on extents, rather than on
inodes, due to features incompatible with inode-based encryption. For
instance, btrfs can have multiple inodes referencing a single block of
data, and moves logical data blocks to different physical locations on
disk in the background. 

As per discussion last year in [1] and later in [2], we would like to
allow the use of fscrypt with btrfs, with authenticated encryption. This
is the first step of that work, adding extent-based encryption to
fscrypt; authenticated encryption is the next step. Extent-based
encryption should be usable by other filesystems which wish to support
snapshotting or background data rearrangement also, but btrfs is the
first user. 

This changeset requires extent encryption to use inlinecrypt, as
discussed previously. 

This applies atop [3], which itself is based on kdave/misc-next. It
passes encryption fstests with suitable changes to btrfs-progs.

Changelog:
v3:
 - Added four additional changes:
   - soft-deleting keys that extent infos might later need to use, so
     the behavior of an open file after key removal matches inode-based
     fscrypt.
   - a set of changes to allow asynchronous info freeing for extents,
     necessary due to locking constraints in btrfs.

v2: 
 - https://lore.kernel.org/linux-fscrypt/cover.1688927487.git.sweettea-kernel@dorminy.me/T/#t


[1] https://docs.google.com/document/d/1janjxewlewtVPqctkWOjSa7OhCgB8Gdx7iDaCDQQNZA/edit?usp=sharing
[2] https://lore.kernel.org/linux-fscrypt/80496cfe-161d-fb0d-8230-93818b966b1b@dorminy.me/T/#t
[3] https://lore.kernel.org/linux-fscrypt/cover.1691505830.git.sweettea-kernel@dorminy.me/

Sweet Tea Dorminy (16):
  fscrypt: factor helper for locking master key
  fscrypt: factor getting info for a specific block
  fscrypt: adjust effective lblks based on extents
  fscrypt: add a super_block pointer to fscrypt_info
  fscrypt: setup leaf inodes for extent encryption
  fscrypt: allow infos to be owned by extents
  fscrypt: use an optional ino equivalent for per-extent infos
  fscrypt: move function call warning of busy inodes
  fscrypt: revamp key removal for extent encryption
  fscrypt: add creation/usage/freeing of per-extent infos
  fscrypt: allow load/save of extent contexts
  fscrypt: save session key credentials for extent infos
  fscrypt: allow multiple extents to reference one info
  fscrypt: cache list of inlinecrypt devices
  fscrypt: allow asynchronous info freeing
  fscrypt: update documentation for per-extent keys

 Documentation/filesystems/fscrypt.rst |  43 +++-
 fs/crypto/crypto.c                    |   6 +-
 fs/crypto/fscrypt_private.h           | 158 +++++++++++-
 fs/crypto/inline_crypt.c              |  49 ++--
 fs/crypto/keyring.c                   |  78 +++---
 fs/crypto/keysetup.c                  | 336 ++++++++++++++++++++++----
 fs/crypto/keysetup_v1.c               |  10 +-
 fs/crypto/policy.c                    |  20 ++
 include/linux/fscrypt.h               |  67 +++++
 9 files changed, 654 insertions(+), 113 deletions(-)


base-commit: 54d2161835d828a9663f548f61d1d9c3d3482122
prerequisite-patch-id: 2f1424d04bb5a76abf0ecf2c9cd8426d300078ae
prerequisite-patch-id: ab342a72cf967dadfb8bec1320c5906fd3c6800f
prerequisite-patch-id: ced2a9dab36539f55c14cd74a28950087c475ff2
prerequisite-patch-id: d4f1a64c994c2fa0d2d4cab83f9ddff52f0622e9
prerequisite-patch-id: 1af0fc98277159b31c26bc5751663efc0d322d75
prerequisite-patch-id: 3b21b62208587486cf9b31618f7c3bc875362f1a
prerequisite-patch-id: c43d693f5b7c498a876d9ffcfc49c11a8ca93d80
prerequisite-patch-id: f120bde1cf47fbef1d9f8fd09cdcccc1408c3ff4
prerequisite-patch-id: c6a1f087d4a67b928b9c6af04e00310bfa74ace1

Comments

Josef Bacik Aug. 9, 2023, 7:52 p.m. UTC | #1
On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote:
> This changeset adds extent-based data encryption to fscrypt.
> Some filesystems need to encrypt data based on extents, rather than on
> inodes, due to features incompatible with inode-based encryption. For
> instance, btrfs can have multiple inodes referencing a single block of
> data, and moves logical data blocks to different physical locations on
> disk in the background. 
> 
> As per discussion last year in [1] and later in [2], we would like to
> allow the use of fscrypt with btrfs, with authenticated encryption. This
> is the first step of that work, adding extent-based encryption to
> fscrypt; authenticated encryption is the next step. Extent-based
> encryption should be usable by other filesystems which wish to support
> snapshotting or background data rearrangement also, but btrfs is the
> first user. 
> 
> This changeset requires extent encryption to use inlinecrypt, as
> discussed previously. 
> 
> This applies atop [3], which itself is based on kdave/misc-next. It
> passes encryption fstests with suitable changes to btrfs-progs.
> 
> Changelog:
> v3:
>  - Added four additional changes:
>    - soft-deleting keys that extent infos might later need to use, so
>      the behavior of an open file after key removal matches inode-based
>      fscrypt.
>    - a set of changes to allow asynchronous info freeing for extents,
>      necessary due to locking constraints in btrfs.
> 
> v2: 
>  - https://lore.kernel.org/linux-fscrypt/cover.1688927487.git.sweettea-kernel@dorminy.me/T/#t
> 
> 
> [1] https://docs.google.com/document/d/1janjxewlewtVPqctkWOjSa7OhCgB8Gdx7iDaCDQQNZA/edit?usp=sharing
> [2] https://lore.kernel.org/linux-fscrypt/80496cfe-161d-fb0d-8230-93818b966b1b@dorminy.me/T/#t
> [3] https://lore.kernel.org/linux-fscrypt/cover.1691505830.git.sweettea-kernel@dorminy.me/
> 

Other than the patches I had comments about this series looks good to me.  You
can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

to the series once you've cleaned up my other comments.

Eric I did my best to try and review these in the context of all the code, I've
gone and looked at how it ties into the btrfs stuff and I've looked at the final
state of the code, it looks good to me, but clearly I don't have the experience
in this code that Sweet Tea or you have.  Thanks,

Josef
Eric Biggers Aug. 10, 2023, 4:55 a.m. UTC | #2
On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote:
> This applies atop [3], which itself is based on kdave/misc-next. It
> passes encryption fstests with suitable changes to btrfs-progs.

Where can I find kdave/misc-next?  The only mention of "kdave" in MAINTAINERS is
the following under BTRFS FILE SYSTEM:

T:      git git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git

But that repo doesn't contain a misc-next branch.

- Eric
Neal Gompa Aug. 10, 2023, 9:52 a.m. UTC | #3
On Thu, Aug 10, 2023 at 1:24 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote:
> > This applies atop [3], which itself is based on kdave/misc-next. It
> > passes encryption fstests with suitable changes to btrfs-progs.
>
> Where can I find kdave/misc-next?  The only mention of "kdave" in MAINTAINERS is
> the following under BTRFS FILE SYSTEM:
>
> T:      git git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
>
> But that repo doesn't contain a misc-next branch.
>

David Sterba's development trees are in this repository:
https://github.com/kdave/btrfs-devel.git
Sweet Tea Dorminy Aug. 10, 2023, 2:11 p.m. UTC | #4
On 8/10/23 00:55, Eric Biggers wrote:
> On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote:
>> This applies atop [3], which itself is based on kdave/misc-next. It
>> passes encryption fstests with suitable changes to btrfs-progs.
> 
> Where can I find kdave/misc-next?  The only mention of "kdave" in MAINTAINERS is
> the following under BTRFS FILE SYSTEM:
> 
> T:      git git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
> 
> But that repo doesn't contain a misc-next branch.
> 
> - Eric

I should've mentioned that more explicitly since the btrfs dev tree 
isn't listed in MAINTAINERS. 
https://github.com/kdave/btrfs-devel/tree/misc-next

Apologies
Eric Biggers Aug. 10, 2023, 5:17 p.m. UTC | #5
On Thu, Aug 10, 2023 at 10:11:56AM -0400, Sweet Tea Dorminy wrote:
> 
> 
> On 8/10/23 00:55, Eric Biggers wrote:
> > On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote:
> > > This applies atop [3], which itself is based on kdave/misc-next. It
> > > passes encryption fstests with suitable changes to btrfs-progs.
> > 
> > Where can I find kdave/misc-next?  The only mention of "kdave" in MAINTAINERS is
> > the following under BTRFS FILE SYSTEM:
> > 
> > T:      git git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
> > 
> > But that repo doesn't contain a misc-next branch.
> > 
> > - Eric
> 
> I should've mentioned that more explicitly since the btrfs dev tree isn't
> listed in MAINTAINERS. https://github.com/kdave/btrfs-devel/tree/misc-next
> 

Does this mean that MAINTAINERS needs to be updated?

- Eric
Eric Biggers Aug. 12, 2023, 10:15 p.m. UTC | #6
Hi Sweet Tea,

On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote:
> This changeset adds extent-based data encryption to fscrypt.
> Some filesystems need to encrypt data based on extents, rather than on
> inodes, due to features incompatible with inode-based encryption. For
> instance, btrfs can have multiple inodes referencing a single block of
> data, and moves logical data blocks to different physical locations on
> disk in the background. 
> 
> As per discussion last year in [1] and later in [2], we would like to
> allow the use of fscrypt with btrfs, with authenticated encryption. This
> is the first step of that work, adding extent-based encryption to
> fscrypt; authenticated encryption is the next step. Extent-based
> encryption should be usable by other filesystems which wish to support
> snapshotting or background data rearrangement also, but btrfs is the
> first user. 
> 
> This changeset requires extent encryption to use inlinecrypt, as
> discussed previously. 
> 
> This applies atop [3], which itself is based on kdave/misc-next. It
> passes encryption fstests with suitable changes to btrfs-progs.
> 
> Changelog:
> v3:
>  - Added four additional changes:
>    - soft-deleting keys that extent infos might later need to use, so
>      the behavior of an open file after key removal matches inode-based
>      fscrypt.
>    - a set of changes to allow asynchronous info freeing for extents,
>      necessary due to locking constraints in btrfs.
> 
> v2: 
>  - https://lore.kernel.org/linux-fscrypt/cover.1688927487.git.sweettea-kernel@dorminy.me/T/#t
> 
> 
> [1] https://docs.google.com/document/d/1janjxewlewtVPqctkWOjSa7OhCgB8Gdx7iDaCDQQNZA/edit?usp=sharing
> [2] https://lore.kernel.org/linux-fscrypt/80496cfe-161d-fb0d-8230-93818b966b1b@dorminy.me/T/#t
> [3] https://lore.kernel.org/linux-fscrypt/cover.1691505830.git.sweettea-kernel@dorminy.me/
> 
> Sweet Tea Dorminy (16):
>   fscrypt: factor helper for locking master key
>   fscrypt: factor getting info for a specific block
>   fscrypt: adjust effective lblks based on extents
>   fscrypt: add a super_block pointer to fscrypt_info
>   fscrypt: setup leaf inodes for extent encryption
>   fscrypt: allow infos to be owned by extents
>   fscrypt: use an optional ino equivalent for per-extent infos
>   fscrypt: move function call warning of busy inodes
>   fscrypt: revamp key removal for extent encryption
>   fscrypt: add creation/usage/freeing of per-extent infos
>   fscrypt: allow load/save of extent contexts
>   fscrypt: save session key credentials for extent infos
>   fscrypt: allow multiple extents to reference one info
>   fscrypt: cache list of inlinecrypt devices
>   fscrypt: allow asynchronous info freeing
>   fscrypt: update documentation for per-extent keys
> 
>  Documentation/filesystems/fscrypt.rst |  43 +++-
>  fs/crypto/crypto.c                    |   6 +-
>  fs/crypto/fscrypt_private.h           | 158 +++++++++++-
>  fs/crypto/inline_crypt.c              |  49 ++--
>  fs/crypto/keyring.c                   |  78 +++---
>  fs/crypto/keysetup.c                  | 336 ++++++++++++++++++++++----
>  fs/crypto/keysetup_v1.c               |  10 +-
>  fs/crypto/policy.c                    |  20 ++
>  include/linux/fscrypt.h               |  67 +++++
>  9 files changed, 654 insertions(+), 113 deletions(-)

I'm taking a look at this, but I don't think it's really reviewable in its
current state.  The main problem is how the new functionality is reusing struct
fscrypt_info.  Before an fscrypt_info was the information (key, policy, inode
back-pointer, master key link, etc.) associated with an inode.  But now it can
be any of the following:

- Information for an inode
- Cached policy (?) for a "leaf inode"
- Information for an extent

Everything just seems kind of mixed together.  It's not clear which fields of
fscrypt_info apply to which scenario, and which scenario(s) is being handled
when code works with fscrypt_info.  There seem to be bugs caused by these
scenarios being mixed up.  Comments are also inconsistent; a huge number of them
still refer only to "inode" or "file" when that is no longer correct.  So it's
quite hard to understand the code now.

I think there really needs to be some work on designing and documenting the data
structures for what you are trying to accomplish.  That really ought to have
been the first step before getting deep into coding the actual functionality.

Have you considered creating a new struct like fscrypt_extent_info, instead of
reusing fscrypt_info?  I think that would make things much clearer.  I suppose
there is code that needs to operate on the shared fields of both, but that could
be done by putting the shared fields in a sub-struct like 'struct
fscrypt_common_info common;', and passing around a pointer to that where needed.

I'd also like to reiterate the concern I raised last month
(https://lore.kernel.org/linux-fscrypt/20230703045417.GA3057@sol.localdomain/):
a lot of this complexity seems to have been contributed to by the "heavyweight
extents" design choice.  I was hoping to see a detailed design for the "change a
directory's tree encryption policy" feature you are planning to use this for, in
order to get some confidence that it will actually be implemented.  Otherwise, I
fear we'll end up building in a lot of complexity for something that never gets
implemented.

- Eric
Josef Bacik Aug. 15, 2023, 3:12 p.m. UTC | #7
On Sat, Aug 12, 2023 at 03:15:14PM -0700, Eric Biggers wrote:
> Hi Sweet Tea,
> 
> On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote:
> > This changeset adds extent-based data encryption to fscrypt.
> > Some filesystems need to encrypt data based on extents, rather than on
> > inodes, due to features incompatible with inode-based encryption. For
> > instance, btrfs can have multiple inodes referencing a single block of
> > data, and moves logical data blocks to different physical locations on
> > disk in the background. 
> > 
> > As per discussion last year in [1] and later in [2], we would like to
> > allow the use of fscrypt with btrfs, with authenticated encryption. This
> > is the first step of that work, adding extent-based encryption to
> > fscrypt; authenticated encryption is the next step. Extent-based
> > encryption should be usable by other filesystems which wish to support
> > snapshotting or background data rearrangement also, but btrfs is the
> > first user. 
> > 
> > This changeset requires extent encryption to use inlinecrypt, as
> > discussed previously. 
> > 
> > This applies atop [3], which itself is based on kdave/misc-next. It
> > passes encryption fstests with suitable changes to btrfs-progs.
> > 
> > Changelog:
> > v3:
> >  - Added four additional changes:
> >    - soft-deleting keys that extent infos might later need to use, so
> >      the behavior of an open file after key removal matches inode-based
> >      fscrypt.
> >    - a set of changes to allow asynchronous info freeing for extents,
> >      necessary due to locking constraints in btrfs.
> > 
> > v2: 
> >  - https://lore.kernel.org/linux-fscrypt/cover.1688927487.git.sweettea-kernel@dorminy.me/T/#t
> > 
> > 
> > [1] https://docs.google.com/document/d/1janjxewlewtVPqctkWOjSa7OhCgB8Gdx7iDaCDQQNZA/edit?usp=sharing
> > [2] https://lore.kernel.org/linux-fscrypt/80496cfe-161d-fb0d-8230-93818b966b1b@dorminy.me/T/#t
> > [3] https://lore.kernel.org/linux-fscrypt/cover.1691505830.git.sweettea-kernel@dorminy.me/
> > 
> > Sweet Tea Dorminy (16):
> >   fscrypt: factor helper for locking master key
> >   fscrypt: factor getting info for a specific block
> >   fscrypt: adjust effective lblks based on extents
> >   fscrypt: add a super_block pointer to fscrypt_info
> >   fscrypt: setup leaf inodes for extent encryption
> >   fscrypt: allow infos to be owned by extents
> >   fscrypt: use an optional ino equivalent for per-extent infos
> >   fscrypt: move function call warning of busy inodes
> >   fscrypt: revamp key removal for extent encryption
> >   fscrypt: add creation/usage/freeing of per-extent infos
> >   fscrypt: allow load/save of extent contexts
> >   fscrypt: save session key credentials for extent infos
> >   fscrypt: allow multiple extents to reference one info
> >   fscrypt: cache list of inlinecrypt devices
> >   fscrypt: allow asynchronous info freeing
> >   fscrypt: update documentation for per-extent keys
> > 
> >  Documentation/filesystems/fscrypt.rst |  43 +++-
> >  fs/crypto/crypto.c                    |   6 +-
> >  fs/crypto/fscrypt_private.h           | 158 +++++++++++-
> >  fs/crypto/inline_crypt.c              |  49 ++--
> >  fs/crypto/keyring.c                   |  78 +++---
> >  fs/crypto/keysetup.c                  | 336 ++++++++++++++++++++++----
> >  fs/crypto/keysetup_v1.c               |  10 +-
> >  fs/crypto/policy.c                    |  20 ++
> >  include/linux/fscrypt.h               |  67 +++++
> >  9 files changed, 654 insertions(+), 113 deletions(-)
> 
> I'm taking a look at this, but I don't think it's really reviewable in its
> current state.  The main problem is how the new functionality is reusing struct
> fscrypt_info.  Before an fscrypt_info was the information (key, policy, inode
> back-pointer, master key link, etc.) associated with an inode.  But now it can
> be any of the following:
> 
> - Information for an inode
> - Cached policy (?) for a "leaf inode"
> - Information for an extent
> 
> Everything just seems kind of mixed together.  It's not clear which fields of
> fscrypt_info apply to which scenario, and which scenario(s) is being handled
> when code works with fscrypt_info.  There seem to be bugs caused by these
> scenarios being mixed up.  Comments are also inconsistent; a huge number of them
> still refer only to "inode" or "file" when that is no longer correct.  So it's
> quite hard to understand the code now.
> 
> I think there really needs to be some work on designing and documenting the data
> structures for what you are trying to accomplish.  That really ought to have
> been the first step before getting deep into coding the actual functionality.
> 
> Have you considered creating a new struct like fscrypt_extent_info, instead of
> reusing fscrypt_info?  I think that would make things much clearer.  I suppose
> there is code that needs to operate on the shared fields of both, but that could
> be done by putting the shared fields in a sub-struct like 'struct
> fscrypt_common_info common;', and passing around a pointer to that where needed.
> 
> I'd also like to reiterate the concern I raised last month
> (https://lore.kernel.org/linux-fscrypt/20230703045417.GA3057@sol.localdomain/):
> a lot of this complexity seems to have been contributed to by the "heavyweight
> extents" design choice.  I was hoping to see a detailed design for the "change a
> directory's tree encryption policy" feature you are planning to use this for, in
> order to get some confidence that it will actually be implemented.  Otherwise, I
> fear we'll end up building in a lot of complexity for something that never gets
> implemented.

This is partly my fault, Sweet Tea has been working on this for a while, and it
seems the lack of progress is partly to do with him wanting everything to be
perfect before sending things out, so I've been encouraging him to increase his
iteration frequency so we could get to a mergeable state sooner.

To that end, I told him to leave off the "change the encryption key"
functionality for now and send that along once we had agreement on this part.
My thinking was that's the hairier/more controversial part, and I want to see
progress made on the core fscrypt functionality so we don't get bogged down on
other discussions.

As for the data structures part I was lead to believe this was important for our
usecase.  But it appears you have another method in mind.

In the interest of getting this thing unstuck I'd like to get it clear in my
head what you're wanting to see, and explain what we're wanting to do, and then
I can be more useful in guiding Sweet Tea through this.

What we want (as I currently understand it):

- We have an un-encrypted subvolumed that contains the base image.  Think a
  chroot of centos9 or whatever.
- Start a container, we snapshot this base image, set an encryption key for this
  container, all new writes into this snapshot will now be encrypted with this
  per-container key.
- This container could potentially create a container within this encrypted
  container to run.  Think a short lived job orchestrator.  I run service X that
  is going to run N tasks, each task in it's own container.  Under my encrypted
  container I'm going to be creating new subvolumes and setting a different key
  for those subvolumes.  Then once those jobs are done I'm going to delete that
  subvolume and carry on.

Weird btrfs things that make life difficult:

- Reflink.  Obviously we're not going to be reflinking from an encrypted
  subvolume into a non-encrypted subvolume, everything will have to match, but
  this is still between different inodes, which means we need some per-extent
  context.
- Snapshots.  Technically we would be fine here since the inodes metadata would
  be the same here, so maybe not so difficult.
- Relocation.  This is our "move data around underneath everybody" thing that we
  do all the time.  I'm not actually sure if this poses a problem, I know Sweet
  Tea said it did, but my eyes sort of glaze over whenever we're talking about
  encryption so I don't remember the details.

What I think you want:

- A clearer deliniation in the fscrypt code of what we do with per-extent
  encryption vs everything else.  This is easy for me to grok.
- A lighter weight per-extent encryption scheme.  This is the part that I'm
  having trouble with.  I've been reviewing the code from a "is this broken"
  standpoint, because I don't have the expertise in this area to make a sound
  judgement.  The btrfs parts are easy, and that code is fine.  I trust Sweet
  Tea's judgement to do make decisions that fit our use case, but this seems to
  be the crux of the problem.

This series is the barebones "get fscrypt encrypting stuff on btrfs" approach,
with followups for the next, hairier bits.

But we're over a year into this process and still stuck, so I'm sitting down to
understand this code and the current situation in order to provide better
guidance for Sweet Tea.  Please correct me where I've gone wrong, I've been
going back and reading the emails trying to catch up, so I'm sure I've missed
things.  Thanks,

Josef
Eric Biggers Aug. 16, 2023, 3:37 a.m. UTC | #8
On Tue, Aug 15, 2023 at 11:12:06AM -0400, Josef Bacik wrote:
> 
> This is partly my fault, Sweet Tea has been working on this for a while, and it
> seems the lack of progress is partly to do with him wanting everything to be
> perfect before sending things out, so I've been encouraging him to increase his
> iteration frequency so we could get to a mergeable state sooner.
> 
> To that end, I told him to leave off the "change the encryption key"
> functionality for now and send that along once we had agreement on this part.
> My thinking was that's the hairier/more controversial part, and I want to see
> progress made on the core fscrypt functionality so we don't get bogged down on
> other discussions.

I've recommended, and continue to recommend, leaving out that feature from the
patchset for now too.  The question is how does the plan to implement this
feature impact the initial patchset, i.e. the basic support for btrfs
encryption.  Should we choose a "heavyweight extents" design (a full
fscrypt_context per extent: nonce, encryption modes, and key) now in preparation
for that feature, or should we choose a "lightweight extents" design (just a
nonce per extent, with modes and key coming from one of the extent's inodes
which would all be enforced to have the same values of those).  The lightweight
extents design wouldn't be compatible with the "change the encryption key"
feature, but I expect it would be simpler.  Maybe there are issues that I
haven't thought of, but that is what I expect.

So before going with a design that is potentially more complex, and won't be
able to be changed later as it will define the on-disk format, I'm hoping to see
a little more confirmation of "yes, this is really needed".  That could mean
getting the design, or even better the implementation, ready for the "change the
encryption key" feature.  Or, maybe you need "heavyweight extents" anyway
because you want/need btrfs extents to be completely self-describing and don't
want to have to pull any information from an inode -- maybe for technical
reasons, maybe for philosophical reasons.  I don't know.  It's up to you guys to
provide the rationale for the design you've chosen.

BTW, one of the reasons for my concern is that the original plan for ext4
encryption, which became fscrypt, was extremely ambitious and resulted in a
complex design.  But only the first phase actually ended up getting implemented,
and as a result the design was simplified greatly just before ext4 encryption
was upstreamed.  I worry about something similar happening, but missing the
"simplify the design" part and ending up with unnecessary complexity.

> As for the data structures part I was lead to believe this was important for our
> usecase.  But it appears you have another method in mind.
> 
> In the interest of getting this thing unstuck I'd like to get it clear in my
> head what you're wanting to see, and explain what we're wanting to do, and then
> I can be more useful in guiding Sweet Tea through this.
> 
> What we want (as I currently understand it):
> 
> - We have an un-encrypted subvolumed that contains the base image.  Think a
>   chroot of centos9 or whatever.
> - Start a container, we snapshot this base image, set an encryption key for this
>   container, all new writes into this snapshot will now be encrypted with this
>   per-container key.
> - This container could potentially create a container within this encrypted
>   container to run.  Think a short lived job orchestrator.  I run service X that
>   is going to run N tasks, each task in it's own container.  Under my encrypted
>   container I'm going to be creating new subvolumes and setting a different key
>   for those subvolumes.  Then once those jobs are done I'm going to delete that
>   subvolume and carry on.
> 
> Weird btrfs things that make life difficult:
> 
> - Reflink.  Obviously we're not going to be reflinking from an encrypted
>   subvolume into a non-encrypted subvolume, everything will have to match, but
>   this is still between different inodes, which means we need some per-extent
>   context.
> - Snapshots.  Technically we would be fine here since the inodes metadata would
>   be the same here, so maybe not so difficult.
> - Relocation.  This is our "move data around underneath everybody" thing that we
>   do all the time.  I'm not actually sure if this poses a problem, I know Sweet
>   Tea said it did, but my eyes sort of glaze over whenever we're talking about
>   encryption so I don't remember the details.

I think a big challenge which is being glossed over is how do you actually set a
new encryption policy for an entire directory tree, which can contain thousands
(or even millions!) of files.  Do you actually go through and update something
on every file, and if so who does it: userspace or kernel?  Or will there be a
layer of indirection that allows the operation to be done in a fixed amount of
work?  Maybe each inode has an encryption policy ID which points into a btree of
encryption policies, and you update that btree to change what the ID refers to?
But that doesn't work if you're changing the encryption policy of only a subset
of files that use a given encryption policy, or if the files are unencrypted.

What happens for directories?  Can their policy change?  I think it can't; all
filenames will have to be encrypted with the original policy.  Is that okay?

What about regular files?  I assume their policy would get replaced, not
appended to, since with extent-based encryption the only purpose of a regular
file's policy is to have it be inherited by new extents of that file?

> 
> What I think you want:
> 
> - A clearer deliniation in the fscrypt code of what we do with per-extent
>   encryption vs everything else.  This is easy for me to grok.
> - A lighter weight per-extent encryption scheme.  This is the part that I'm
>   having trouble with.  I've been reviewing the code from a "is this broken"
>   standpoint, because I don't have the expertise in this area to make a sound
>   judgement.  The btrfs parts are easy, and that code is fine.  I trust Sweet
>   Tea's judgement to do make decisions that fit our use case, but this seems to
>   be the crux of the problem.
> 
> This series is the barebones "get fscrypt encrypting stuff on btrfs" approach,
> with followups for the next, hairier bits.
> 
> But we're over a year into this process and still stuck, so I'm sitting down to
> understand this code and the current situation in order to provide better
> guidance for Sweet Tea.  Please correct me where I've gone wrong, I've been
> going back and reading the emails trying to catch up, so I'm sure I've missed
> things.  Thanks,

The number one thing I'm asking for is something that's maintainable and as easy
to understand as possible.  I don't think we've quite reached that yet.

- Eric
Josef Bacik Aug. 16, 2023, 2:42 p.m. UTC | #9
On Tue, Aug 15, 2023 at 08:37:20PM -0700, Eric Biggers wrote:
> On Tue, Aug 15, 2023 at 11:12:06AM -0400, Josef Bacik wrote:
> > 
> > This is partly my fault, Sweet Tea has been working on this for a while, and it
> > seems the lack of progress is partly to do with him wanting everything to be
> > perfect before sending things out, so I've been encouraging him to increase his
> > iteration frequency so we could get to a mergeable state sooner.
> > 
> > To that end, I told him to leave off the "change the encryption key"
> > functionality for now and send that along once we had agreement on this part.
> > My thinking was that's the hairier/more controversial part, and I want to see
> > progress made on the core fscrypt functionality so we don't get bogged down on
> > other discussions.
> 
> I've recommended, and continue to recommend, leaving out that feature from the
> patchset for now too.  The question is how does the plan to implement this
> feature impact the initial patchset, i.e. the basic support for btrfs
> encryption.  Should we choose a "heavyweight extents" design (a full
> fscrypt_context per extent: nonce, encryption modes, and key) now in preparation
> for that feature, or should we choose a "lightweight extents" design (just a
> nonce per extent, with modes and key coming from one of the extent's inodes
> which would all be enforced to have the same values of those).  The lightweight
> extents design wouldn't be compatible with the "change the encryption key"
> feature, but I expect it would be simpler.  Maybe there are issues that I
> haven't thought of, but that is what I expect.
> 
> So before going with a design that is potentially more complex, and won't be
> able to be changed later as it will define the on-disk format, I'm hoping to see
> a little more confirmation of "yes, this is really needed".  That could mean
> getting the design, or even better the implementation, ready for the "change the
> encryption key" feature.  Or, maybe you need "heavyweight extents" anyway
> because you want/need btrfs extents to be completely self-describing and don't
> want to have to pull any information from an inode -- maybe for technical
> reasons, maybe for philosophical reasons.  I don't know.  It's up to you guys to
> provide the rationale for the design you've chosen.

I've spent the last day getting a grasp on everything that we want and I'm in a
better place to answer these questions.

There seemed to be some confusion around having nested subvolumes with different
keys, and then reflinking between them.

So 

/container
/container/subcontainer

reflink /container/packagefile /container/subcontainer/packagefile

in this case the inode context wouldn't help, the inodes are different, thus the
heavy weight design is necessary.

However this isn't actually a thing that is a topline need.  I've told our
customers that in this case both /container and /container/subcontainer would
need to share encryption keys, and so this eliminates the need for the heavier
weight solution.  I like your design better, I think it fits most of our goals.

The actual real requirement we have is the unencrypted to encrypted, which is

/base/image
btrfs sub snap /base/image /container
turn on encryptiong /container

new writes in /container get encrypted.  We also want to be able to do

reflink /path/to/unencrypted/package/store/package /container/package

and have this work.  This still works with your design.  The only thing we would
reject with our design is

reflink /encrypted/key1/file /encrypted/key2/file

because the key id's wouldn't match.  This is an acceptable limitation for us to
acheive a more simplistic design.

> 
> BTW, one of the reasons for my concern is that the original plan for ext4
> encryption, which became fscrypt, was extremely ambitious and resulted in a
> complex design.  But only the first phase actually ended up getting implemented,
> and as a result the design was simplified greatly just before ext4 encryption
> was upstreamed.  I worry about something similar happening, but missing the
> "simplify the design" part and ending up with unnecessary complexity.
>
> > As for the data structures part I was lead to believe this was important for our
> > usecase.  But it appears you have another method in mind.
> > 
> > In the interest of getting this thing unstuck I'd like to get it clear in my
> > head what you're wanting to see, and explain what we're wanting to do, and then
> > I can be more useful in guiding Sweet Tea through this.
> > 
> > What we want (as I currently understand it):
> > 
> > - We have an un-encrypted subvolumed that contains the base image.  Think a
> >   chroot of centos9 or whatever.
> > - Start a container, we snapshot this base image, set an encryption key for this
> >   container, all new writes into this snapshot will now be encrypted with this
> >   per-container key.
> > - This container could potentially create a container within this encrypted
> >   container to run.  Think a short lived job orchestrator.  I run service X that
> >   is going to run N tasks, each task in it's own container.  Under my encrypted
> >   container I'm going to be creating new subvolumes and setting a different key
> >   for those subvolumes.  Then once those jobs are done I'm going to delete that
> >   subvolume and carry on.
> > 
> > Weird btrfs things that make life difficult:
> > 
> > - Reflink.  Obviously we're not going to be reflinking from an encrypted
> >   subvolume into a non-encrypted subvolume, everything will have to match, but
> >   this is still between different inodes, which means we need some per-extent
> >   context.
> > - Snapshots.  Technically we would be fine here since the inodes metadata would
> >   be the same here, so maybe not so difficult.
> > - Relocation.  This is our "move data around underneath everybody" thing that we
> >   do all the time.  I'm not actually sure if this poses a problem, I know Sweet
> >   Tea said it did, but my eyes sort of glaze over whenever we're talking about
> >   encryption so I don't remember the details.
> 
> I think a big challenge which is being glossed over is how do you actually set a
> new encryption policy for an entire directory tree, which can contain thousands
> (or even millions!) of files.  Do you actually go through and update something
> on every file, and if so who does it: userspace or kernel?  Or will there be a
> layer of indirection that allows the operation to be done in a fixed amount of
> work?  Maybe each inode has an encryption policy ID which points into a btree of
> encryption policies, and you update that btree to change what the ID refers to?
> But that doesn't work if you're changing the encryption policy of only a subset
> of files that use a given encryption policy, or if the files are unencrypted.

Since the scope is purely unencrypted snapshot -> encrypted this is much
simpler, only new things get the encryption policy set.

Say we snapshot, set the encryption key, and then write to the middle of a file.
That inode gets its encryption policy set, and that file extent is marked as
encrypted.  When we go to read back that whole file later we see the unencrypted
extents and read those like normal, then get to the encrypted extent and decrypt
that like normal.

> 
> > 
> > What I think you want:
> > 
> > - A clearer deliniation in the fscrypt code of what we do with per-extent
> >   encryption vs everything else.  This is easy for me to grok.
> > - A lighter weight per-extent encryption scheme.  This is the part that I'm
> >   having trouble with.  I've been reviewing the code from a "is this broken"
> >   standpoint, because I don't have the expertise in this area to make a sound
> >   judgement.  The btrfs parts are easy, and that code is fine.  I trust Sweet
> >   Tea's judgement to do make decisions that fit our use case, but this seems to
> >   be the crux of the problem.
> > 
> > This series is the barebones "get fscrypt encrypting stuff on btrfs" approach,
> > with followups for the next, hairier bits.
> > 
> > But we're over a year into this process and still stuck, so I'm sitting down to
> > understand this code and the current situation in order to provide better
> > guidance for Sweet Tea.  Please correct me where I've gone wrong, I've been
> > going back and reading the emails trying to catch up, so I'm sure I've missed
> > things.  Thanks,
> 
> The number one thing I'm asking for is something that's maintainable and as easy
> to understand as possible.  I don't think we've quite reached that yet.
> 

That's fair.  Like I said I've purposefully stayed out of this part because it's
quite a bit to page in a whole other project into my head and figure out what
would be the best course of action.  I think that narrowing our usecase to only
wanting to support unencrypted->encrypted for snapshots will make this simpler,
and integrating your more simplified light weight design for extent encryption
will result in something more to your liking.  Thanks,

Josef