Message ID | cover.1691505882.git.sweettea-kernel@dorminy.me (mailing list archive) |
---|---|
Headers | show |
Series | fscrypt: add extent encryption | expand |
On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote: > This changeset adds extent-based data encryption to fscrypt. > Some filesystems need to encrypt data based on extents, rather than on > inodes, due to features incompatible with inode-based encryption. For > instance, btrfs can have multiple inodes referencing a single block of > data, and moves logical data blocks to different physical locations on > disk in the background. > > As per discussion last year in [1] and later in [2], we would like to > allow the use of fscrypt with btrfs, with authenticated encryption. This > is the first step of that work, adding extent-based encryption to > fscrypt; authenticated encryption is the next step. Extent-based > encryption should be usable by other filesystems which wish to support > snapshotting or background data rearrangement also, but btrfs is the > first user. > > This changeset requires extent encryption to use inlinecrypt, as > discussed previously. > > This applies atop [3], which itself is based on kdave/misc-next. It > passes encryption fstests with suitable changes to btrfs-progs. > > Changelog: > v3: > - Added four additional changes: > - soft-deleting keys that extent infos might later need to use, so > the behavior of an open file after key removal matches inode-based > fscrypt. > - a set of changes to allow asynchronous info freeing for extents, > necessary due to locking constraints in btrfs. > > v2: > - https://lore.kernel.org/linux-fscrypt/cover.1688927487.git.sweettea-kernel@dorminy.me/T/#t > > > [1] https://docs.google.com/document/d/1janjxewlewtVPqctkWOjSa7OhCgB8Gdx7iDaCDQQNZA/edit?usp=sharing > [2] https://lore.kernel.org/linux-fscrypt/80496cfe-161d-fb0d-8230-93818b966b1b@dorminy.me/T/#t > [3] https://lore.kernel.org/linux-fscrypt/cover.1691505830.git.sweettea-kernel@dorminy.me/ > Other than the patches I had comments about this series looks good to me. You can add Reviewed-by: Josef Bacik <josef@toxicpanda.com> to the series once you've cleaned up my other comments. Eric I did my best to try and review these in the context of all the code, I've gone and looked at how it ties into the btrfs stuff and I've looked at the final state of the code, it looks good to me, but clearly I don't have the experience in this code that Sweet Tea or you have. Thanks, Josef
On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote: > This applies atop [3], which itself is based on kdave/misc-next. It > passes encryption fstests with suitable changes to btrfs-progs. Where can I find kdave/misc-next? The only mention of "kdave" in MAINTAINERS is the following under BTRFS FILE SYSTEM: T: git git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git But that repo doesn't contain a misc-next branch. - Eric
On Thu, Aug 10, 2023 at 1:24 AM Eric Biggers <ebiggers@kernel.org> wrote: > > On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote: > > This applies atop [3], which itself is based on kdave/misc-next. It > > passes encryption fstests with suitable changes to btrfs-progs. > > Where can I find kdave/misc-next? The only mention of "kdave" in MAINTAINERS is > the following under BTRFS FILE SYSTEM: > > T: git git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git > > But that repo doesn't contain a misc-next branch. > David Sterba's development trees are in this repository: https://github.com/kdave/btrfs-devel.git
On 8/10/23 00:55, Eric Biggers wrote: > On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote: >> This applies atop [3], which itself is based on kdave/misc-next. It >> passes encryption fstests with suitable changes to btrfs-progs. > > Where can I find kdave/misc-next? The only mention of "kdave" in MAINTAINERS is > the following under BTRFS FILE SYSTEM: > > T: git git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git > > But that repo doesn't contain a misc-next branch. > > - Eric I should've mentioned that more explicitly since the btrfs dev tree isn't listed in MAINTAINERS. https://github.com/kdave/btrfs-devel/tree/misc-next Apologies
On Thu, Aug 10, 2023 at 10:11:56AM -0400, Sweet Tea Dorminy wrote: > > > On 8/10/23 00:55, Eric Biggers wrote: > > On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote: > > > This applies atop [3], which itself is based on kdave/misc-next. It > > > passes encryption fstests with suitable changes to btrfs-progs. > > > > Where can I find kdave/misc-next? The only mention of "kdave" in MAINTAINERS is > > the following under BTRFS FILE SYSTEM: > > > > T: git git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git > > > > But that repo doesn't contain a misc-next branch. > > > > - Eric > > I should've mentioned that more explicitly since the btrfs dev tree isn't > listed in MAINTAINERS. https://github.com/kdave/btrfs-devel/tree/misc-next > Does this mean that MAINTAINERS needs to be updated? - Eric
Hi Sweet Tea, On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote: > This changeset adds extent-based data encryption to fscrypt. > Some filesystems need to encrypt data based on extents, rather than on > inodes, due to features incompatible with inode-based encryption. For > instance, btrfs can have multiple inodes referencing a single block of > data, and moves logical data blocks to different physical locations on > disk in the background. > > As per discussion last year in [1] and later in [2], we would like to > allow the use of fscrypt with btrfs, with authenticated encryption. This > is the first step of that work, adding extent-based encryption to > fscrypt; authenticated encryption is the next step. Extent-based > encryption should be usable by other filesystems which wish to support > snapshotting or background data rearrangement also, but btrfs is the > first user. > > This changeset requires extent encryption to use inlinecrypt, as > discussed previously. > > This applies atop [3], which itself is based on kdave/misc-next. It > passes encryption fstests with suitable changes to btrfs-progs. > > Changelog: > v3: > - Added four additional changes: > - soft-deleting keys that extent infos might later need to use, so > the behavior of an open file after key removal matches inode-based > fscrypt. > - a set of changes to allow asynchronous info freeing for extents, > necessary due to locking constraints in btrfs. > > v2: > - https://lore.kernel.org/linux-fscrypt/cover.1688927487.git.sweettea-kernel@dorminy.me/T/#t > > > [1] https://docs.google.com/document/d/1janjxewlewtVPqctkWOjSa7OhCgB8Gdx7iDaCDQQNZA/edit?usp=sharing > [2] https://lore.kernel.org/linux-fscrypt/80496cfe-161d-fb0d-8230-93818b966b1b@dorminy.me/T/#t > [3] https://lore.kernel.org/linux-fscrypt/cover.1691505830.git.sweettea-kernel@dorminy.me/ > > Sweet Tea Dorminy (16): > fscrypt: factor helper for locking master key > fscrypt: factor getting info for a specific block > fscrypt: adjust effective lblks based on extents > fscrypt: add a super_block pointer to fscrypt_info > fscrypt: setup leaf inodes for extent encryption > fscrypt: allow infos to be owned by extents > fscrypt: use an optional ino equivalent for per-extent infos > fscrypt: move function call warning of busy inodes > fscrypt: revamp key removal for extent encryption > fscrypt: add creation/usage/freeing of per-extent infos > fscrypt: allow load/save of extent contexts > fscrypt: save session key credentials for extent infos > fscrypt: allow multiple extents to reference one info > fscrypt: cache list of inlinecrypt devices > fscrypt: allow asynchronous info freeing > fscrypt: update documentation for per-extent keys > > Documentation/filesystems/fscrypt.rst | 43 +++- > fs/crypto/crypto.c | 6 +- > fs/crypto/fscrypt_private.h | 158 +++++++++++- > fs/crypto/inline_crypt.c | 49 ++-- > fs/crypto/keyring.c | 78 +++--- > fs/crypto/keysetup.c | 336 ++++++++++++++++++++++---- > fs/crypto/keysetup_v1.c | 10 +- > fs/crypto/policy.c | 20 ++ > include/linux/fscrypt.h | 67 +++++ > 9 files changed, 654 insertions(+), 113 deletions(-) I'm taking a look at this, but I don't think it's really reviewable in its current state. The main problem is how the new functionality is reusing struct fscrypt_info. Before an fscrypt_info was the information (key, policy, inode back-pointer, master key link, etc.) associated with an inode. But now it can be any of the following: - Information for an inode - Cached policy (?) for a "leaf inode" - Information for an extent Everything just seems kind of mixed together. It's not clear which fields of fscrypt_info apply to which scenario, and which scenario(s) is being handled when code works with fscrypt_info. There seem to be bugs caused by these scenarios being mixed up. Comments are also inconsistent; a huge number of them still refer only to "inode" or "file" when that is no longer correct. So it's quite hard to understand the code now. I think there really needs to be some work on designing and documenting the data structures for what you are trying to accomplish. That really ought to have been the first step before getting deep into coding the actual functionality. Have you considered creating a new struct like fscrypt_extent_info, instead of reusing fscrypt_info? I think that would make things much clearer. I suppose there is code that needs to operate on the shared fields of both, but that could be done by putting the shared fields in a sub-struct like 'struct fscrypt_common_info common;', and passing around a pointer to that where needed. I'd also like to reiterate the concern I raised last month (https://lore.kernel.org/linux-fscrypt/20230703045417.GA3057@sol.localdomain/): a lot of this complexity seems to have been contributed to by the "heavyweight extents" design choice. I was hoping to see a detailed design for the "change a directory's tree encryption policy" feature you are planning to use this for, in order to get some confidence that it will actually be implemented. Otherwise, I fear we'll end up building in a lot of complexity for something that never gets implemented. - Eric
On Sat, Aug 12, 2023 at 03:15:14PM -0700, Eric Biggers wrote: > Hi Sweet Tea, > > On Tue, Aug 08, 2023 at 01:08:17PM -0400, Sweet Tea Dorminy wrote: > > This changeset adds extent-based data encryption to fscrypt. > > Some filesystems need to encrypt data based on extents, rather than on > > inodes, due to features incompatible with inode-based encryption. For > > instance, btrfs can have multiple inodes referencing a single block of > > data, and moves logical data blocks to different physical locations on > > disk in the background. > > > > As per discussion last year in [1] and later in [2], we would like to > > allow the use of fscrypt with btrfs, with authenticated encryption. This > > is the first step of that work, adding extent-based encryption to > > fscrypt; authenticated encryption is the next step. Extent-based > > encryption should be usable by other filesystems which wish to support > > snapshotting or background data rearrangement also, but btrfs is the > > first user. > > > > This changeset requires extent encryption to use inlinecrypt, as > > discussed previously. > > > > This applies atop [3], which itself is based on kdave/misc-next. It > > passes encryption fstests with suitable changes to btrfs-progs. > > > > Changelog: > > v3: > > - Added four additional changes: > > - soft-deleting keys that extent infos might later need to use, so > > the behavior of an open file after key removal matches inode-based > > fscrypt. > > - a set of changes to allow asynchronous info freeing for extents, > > necessary due to locking constraints in btrfs. > > > > v2: > > - https://lore.kernel.org/linux-fscrypt/cover.1688927487.git.sweettea-kernel@dorminy.me/T/#t > > > > > > [1] https://docs.google.com/document/d/1janjxewlewtVPqctkWOjSa7OhCgB8Gdx7iDaCDQQNZA/edit?usp=sharing > > [2] https://lore.kernel.org/linux-fscrypt/80496cfe-161d-fb0d-8230-93818b966b1b@dorminy.me/T/#t > > [3] https://lore.kernel.org/linux-fscrypt/cover.1691505830.git.sweettea-kernel@dorminy.me/ > > > > Sweet Tea Dorminy (16): > > fscrypt: factor helper for locking master key > > fscrypt: factor getting info for a specific block > > fscrypt: adjust effective lblks based on extents > > fscrypt: add a super_block pointer to fscrypt_info > > fscrypt: setup leaf inodes for extent encryption > > fscrypt: allow infos to be owned by extents > > fscrypt: use an optional ino equivalent for per-extent infos > > fscrypt: move function call warning of busy inodes > > fscrypt: revamp key removal for extent encryption > > fscrypt: add creation/usage/freeing of per-extent infos > > fscrypt: allow load/save of extent contexts > > fscrypt: save session key credentials for extent infos > > fscrypt: allow multiple extents to reference one info > > fscrypt: cache list of inlinecrypt devices > > fscrypt: allow asynchronous info freeing > > fscrypt: update documentation for per-extent keys > > > > Documentation/filesystems/fscrypt.rst | 43 +++- > > fs/crypto/crypto.c | 6 +- > > fs/crypto/fscrypt_private.h | 158 +++++++++++- > > fs/crypto/inline_crypt.c | 49 ++-- > > fs/crypto/keyring.c | 78 +++--- > > fs/crypto/keysetup.c | 336 ++++++++++++++++++++++---- > > fs/crypto/keysetup_v1.c | 10 +- > > fs/crypto/policy.c | 20 ++ > > include/linux/fscrypt.h | 67 +++++ > > 9 files changed, 654 insertions(+), 113 deletions(-) > > I'm taking a look at this, but I don't think it's really reviewable in its > current state. The main problem is how the new functionality is reusing struct > fscrypt_info. Before an fscrypt_info was the information (key, policy, inode > back-pointer, master key link, etc.) associated with an inode. But now it can > be any of the following: > > - Information for an inode > - Cached policy (?) for a "leaf inode" > - Information for an extent > > Everything just seems kind of mixed together. It's not clear which fields of > fscrypt_info apply to which scenario, and which scenario(s) is being handled > when code works with fscrypt_info. There seem to be bugs caused by these > scenarios being mixed up. Comments are also inconsistent; a huge number of them > still refer only to "inode" or "file" when that is no longer correct. So it's > quite hard to understand the code now. > > I think there really needs to be some work on designing and documenting the data > structures for what you are trying to accomplish. That really ought to have > been the first step before getting deep into coding the actual functionality. > > Have you considered creating a new struct like fscrypt_extent_info, instead of > reusing fscrypt_info? I think that would make things much clearer. I suppose > there is code that needs to operate on the shared fields of both, but that could > be done by putting the shared fields in a sub-struct like 'struct > fscrypt_common_info common;', and passing around a pointer to that where needed. > > I'd also like to reiterate the concern I raised last month > (https://lore.kernel.org/linux-fscrypt/20230703045417.GA3057@sol.localdomain/): > a lot of this complexity seems to have been contributed to by the "heavyweight > extents" design choice. I was hoping to see a detailed design for the "change a > directory's tree encryption policy" feature you are planning to use this for, in > order to get some confidence that it will actually be implemented. Otherwise, I > fear we'll end up building in a lot of complexity for something that never gets > implemented. This is partly my fault, Sweet Tea has been working on this for a while, and it seems the lack of progress is partly to do with him wanting everything to be perfect before sending things out, so I've been encouraging him to increase his iteration frequency so we could get to a mergeable state sooner. To that end, I told him to leave off the "change the encryption key" functionality for now and send that along once we had agreement on this part. My thinking was that's the hairier/more controversial part, and I want to see progress made on the core fscrypt functionality so we don't get bogged down on other discussions. As for the data structures part I was lead to believe this was important for our usecase. But it appears you have another method in mind. In the interest of getting this thing unstuck I'd like to get it clear in my head what you're wanting to see, and explain what we're wanting to do, and then I can be more useful in guiding Sweet Tea through this. What we want (as I currently understand it): - We have an un-encrypted subvolumed that contains the base image. Think a chroot of centos9 or whatever. - Start a container, we snapshot this base image, set an encryption key for this container, all new writes into this snapshot will now be encrypted with this per-container key. - This container could potentially create a container within this encrypted container to run. Think a short lived job orchestrator. I run service X that is going to run N tasks, each task in it's own container. Under my encrypted container I'm going to be creating new subvolumes and setting a different key for those subvolumes. Then once those jobs are done I'm going to delete that subvolume and carry on. Weird btrfs things that make life difficult: - Reflink. Obviously we're not going to be reflinking from an encrypted subvolume into a non-encrypted subvolume, everything will have to match, but this is still between different inodes, which means we need some per-extent context. - Snapshots. Technically we would be fine here since the inodes metadata would be the same here, so maybe not so difficult. - Relocation. This is our "move data around underneath everybody" thing that we do all the time. I'm not actually sure if this poses a problem, I know Sweet Tea said it did, but my eyes sort of glaze over whenever we're talking about encryption so I don't remember the details. What I think you want: - A clearer deliniation in the fscrypt code of what we do with per-extent encryption vs everything else. This is easy for me to grok. - A lighter weight per-extent encryption scheme. This is the part that I'm having trouble with. I've been reviewing the code from a "is this broken" standpoint, because I don't have the expertise in this area to make a sound judgement. The btrfs parts are easy, and that code is fine. I trust Sweet Tea's judgement to do make decisions that fit our use case, but this seems to be the crux of the problem. This series is the barebones "get fscrypt encrypting stuff on btrfs" approach, with followups for the next, hairier bits. But we're over a year into this process and still stuck, so I'm sitting down to understand this code and the current situation in order to provide better guidance for Sweet Tea. Please correct me where I've gone wrong, I've been going back and reading the emails trying to catch up, so I'm sure I've missed things. Thanks, Josef
On Tue, Aug 15, 2023 at 11:12:06AM -0400, Josef Bacik wrote: > > This is partly my fault, Sweet Tea has been working on this for a while, and it > seems the lack of progress is partly to do with him wanting everything to be > perfect before sending things out, so I've been encouraging him to increase his > iteration frequency so we could get to a mergeable state sooner. > > To that end, I told him to leave off the "change the encryption key" > functionality for now and send that along once we had agreement on this part. > My thinking was that's the hairier/more controversial part, and I want to see > progress made on the core fscrypt functionality so we don't get bogged down on > other discussions. I've recommended, and continue to recommend, leaving out that feature from the patchset for now too. The question is how does the plan to implement this feature impact the initial patchset, i.e. the basic support for btrfs encryption. Should we choose a "heavyweight extents" design (a full fscrypt_context per extent: nonce, encryption modes, and key) now in preparation for that feature, or should we choose a "lightweight extents" design (just a nonce per extent, with modes and key coming from one of the extent's inodes which would all be enforced to have the same values of those). The lightweight extents design wouldn't be compatible with the "change the encryption key" feature, but I expect it would be simpler. Maybe there are issues that I haven't thought of, but that is what I expect. So before going with a design that is potentially more complex, and won't be able to be changed later as it will define the on-disk format, I'm hoping to see a little more confirmation of "yes, this is really needed". That could mean getting the design, or even better the implementation, ready for the "change the encryption key" feature. Or, maybe you need "heavyweight extents" anyway because you want/need btrfs extents to be completely self-describing and don't want to have to pull any information from an inode -- maybe for technical reasons, maybe for philosophical reasons. I don't know. It's up to you guys to provide the rationale for the design you've chosen. BTW, one of the reasons for my concern is that the original plan for ext4 encryption, which became fscrypt, was extremely ambitious and resulted in a complex design. But only the first phase actually ended up getting implemented, and as a result the design was simplified greatly just before ext4 encryption was upstreamed. I worry about something similar happening, but missing the "simplify the design" part and ending up with unnecessary complexity. > As for the data structures part I was lead to believe this was important for our > usecase. But it appears you have another method in mind. > > In the interest of getting this thing unstuck I'd like to get it clear in my > head what you're wanting to see, and explain what we're wanting to do, and then > I can be more useful in guiding Sweet Tea through this. > > What we want (as I currently understand it): > > - We have an un-encrypted subvolumed that contains the base image. Think a > chroot of centos9 or whatever. > - Start a container, we snapshot this base image, set an encryption key for this > container, all new writes into this snapshot will now be encrypted with this > per-container key. > - This container could potentially create a container within this encrypted > container to run. Think a short lived job orchestrator. I run service X that > is going to run N tasks, each task in it's own container. Under my encrypted > container I'm going to be creating new subvolumes and setting a different key > for those subvolumes. Then once those jobs are done I'm going to delete that > subvolume and carry on. > > Weird btrfs things that make life difficult: > > - Reflink. Obviously we're not going to be reflinking from an encrypted > subvolume into a non-encrypted subvolume, everything will have to match, but > this is still between different inodes, which means we need some per-extent > context. > - Snapshots. Technically we would be fine here since the inodes metadata would > be the same here, so maybe not so difficult. > - Relocation. This is our "move data around underneath everybody" thing that we > do all the time. I'm not actually sure if this poses a problem, I know Sweet > Tea said it did, but my eyes sort of glaze over whenever we're talking about > encryption so I don't remember the details. I think a big challenge which is being glossed over is how do you actually set a new encryption policy for an entire directory tree, which can contain thousands (or even millions!) of files. Do you actually go through and update something on every file, and if so who does it: userspace or kernel? Or will there be a layer of indirection that allows the operation to be done in a fixed amount of work? Maybe each inode has an encryption policy ID which points into a btree of encryption policies, and you update that btree to change what the ID refers to? But that doesn't work if you're changing the encryption policy of only a subset of files that use a given encryption policy, or if the files are unencrypted. What happens for directories? Can their policy change? I think it can't; all filenames will have to be encrypted with the original policy. Is that okay? What about regular files? I assume their policy would get replaced, not appended to, since with extent-based encryption the only purpose of a regular file's policy is to have it be inherited by new extents of that file? > > What I think you want: > > - A clearer deliniation in the fscrypt code of what we do with per-extent > encryption vs everything else. This is easy for me to grok. > - A lighter weight per-extent encryption scheme. This is the part that I'm > having trouble with. I've been reviewing the code from a "is this broken" > standpoint, because I don't have the expertise in this area to make a sound > judgement. The btrfs parts are easy, and that code is fine. I trust Sweet > Tea's judgement to do make decisions that fit our use case, but this seems to > be the crux of the problem. > > This series is the barebones "get fscrypt encrypting stuff on btrfs" approach, > with followups for the next, hairier bits. > > But we're over a year into this process and still stuck, so I'm sitting down to > understand this code and the current situation in order to provide better > guidance for Sweet Tea. Please correct me where I've gone wrong, I've been > going back and reading the emails trying to catch up, so I'm sure I've missed > things. Thanks, The number one thing I'm asking for is something that's maintainable and as easy to understand as possible. I don't think we've quite reached that yet. - Eric
On Tue, Aug 15, 2023 at 08:37:20PM -0700, Eric Biggers wrote: > On Tue, Aug 15, 2023 at 11:12:06AM -0400, Josef Bacik wrote: > > > > This is partly my fault, Sweet Tea has been working on this for a while, and it > > seems the lack of progress is partly to do with him wanting everything to be > > perfect before sending things out, so I've been encouraging him to increase his > > iteration frequency so we could get to a mergeable state sooner. > > > > To that end, I told him to leave off the "change the encryption key" > > functionality for now and send that along once we had agreement on this part. > > My thinking was that's the hairier/more controversial part, and I want to see > > progress made on the core fscrypt functionality so we don't get bogged down on > > other discussions. > > I've recommended, and continue to recommend, leaving out that feature from the > patchset for now too. The question is how does the plan to implement this > feature impact the initial patchset, i.e. the basic support for btrfs > encryption. Should we choose a "heavyweight extents" design (a full > fscrypt_context per extent: nonce, encryption modes, and key) now in preparation > for that feature, or should we choose a "lightweight extents" design (just a > nonce per extent, with modes and key coming from one of the extent's inodes > which would all be enforced to have the same values of those). The lightweight > extents design wouldn't be compatible with the "change the encryption key" > feature, but I expect it would be simpler. Maybe there are issues that I > haven't thought of, but that is what I expect. > > So before going with a design that is potentially more complex, and won't be > able to be changed later as it will define the on-disk format, I'm hoping to see > a little more confirmation of "yes, this is really needed". That could mean > getting the design, or even better the implementation, ready for the "change the > encryption key" feature. Or, maybe you need "heavyweight extents" anyway > because you want/need btrfs extents to be completely self-describing and don't > want to have to pull any information from an inode -- maybe for technical > reasons, maybe for philosophical reasons. I don't know. It's up to you guys to > provide the rationale for the design you've chosen. I've spent the last day getting a grasp on everything that we want and I'm in a better place to answer these questions. There seemed to be some confusion around having nested subvolumes with different keys, and then reflinking between them. So /container /container/subcontainer reflink /container/packagefile /container/subcontainer/packagefile in this case the inode context wouldn't help, the inodes are different, thus the heavy weight design is necessary. However this isn't actually a thing that is a topline need. I've told our customers that in this case both /container and /container/subcontainer would need to share encryption keys, and so this eliminates the need for the heavier weight solution. I like your design better, I think it fits most of our goals. The actual real requirement we have is the unencrypted to encrypted, which is /base/image btrfs sub snap /base/image /container turn on encryptiong /container new writes in /container get encrypted. We also want to be able to do reflink /path/to/unencrypted/package/store/package /container/package and have this work. This still works with your design. The only thing we would reject with our design is reflink /encrypted/key1/file /encrypted/key2/file because the key id's wouldn't match. This is an acceptable limitation for us to acheive a more simplistic design. > > BTW, one of the reasons for my concern is that the original plan for ext4 > encryption, which became fscrypt, was extremely ambitious and resulted in a > complex design. But only the first phase actually ended up getting implemented, > and as a result the design was simplified greatly just before ext4 encryption > was upstreamed. I worry about something similar happening, but missing the > "simplify the design" part and ending up with unnecessary complexity. > > > As for the data structures part I was lead to believe this was important for our > > usecase. But it appears you have another method in mind. > > > > In the interest of getting this thing unstuck I'd like to get it clear in my > > head what you're wanting to see, and explain what we're wanting to do, and then > > I can be more useful in guiding Sweet Tea through this. > > > > What we want (as I currently understand it): > > > > - We have an un-encrypted subvolumed that contains the base image. Think a > > chroot of centos9 or whatever. > > - Start a container, we snapshot this base image, set an encryption key for this > > container, all new writes into this snapshot will now be encrypted with this > > per-container key. > > - This container could potentially create a container within this encrypted > > container to run. Think a short lived job orchestrator. I run service X that > > is going to run N tasks, each task in it's own container. Under my encrypted > > container I'm going to be creating new subvolumes and setting a different key > > for those subvolumes. Then once those jobs are done I'm going to delete that > > subvolume and carry on. > > > > Weird btrfs things that make life difficult: > > > > - Reflink. Obviously we're not going to be reflinking from an encrypted > > subvolume into a non-encrypted subvolume, everything will have to match, but > > this is still between different inodes, which means we need some per-extent > > context. > > - Snapshots. Technically we would be fine here since the inodes metadata would > > be the same here, so maybe not so difficult. > > - Relocation. This is our "move data around underneath everybody" thing that we > > do all the time. I'm not actually sure if this poses a problem, I know Sweet > > Tea said it did, but my eyes sort of glaze over whenever we're talking about > > encryption so I don't remember the details. > > I think a big challenge which is being glossed over is how do you actually set a > new encryption policy for an entire directory tree, which can contain thousands > (or even millions!) of files. Do you actually go through and update something > on every file, and if so who does it: userspace or kernel? Or will there be a > layer of indirection that allows the operation to be done in a fixed amount of > work? Maybe each inode has an encryption policy ID which points into a btree of > encryption policies, and you update that btree to change what the ID refers to? > But that doesn't work if you're changing the encryption policy of only a subset > of files that use a given encryption policy, or if the files are unencrypted. Since the scope is purely unencrypted snapshot -> encrypted this is much simpler, only new things get the encryption policy set. Say we snapshot, set the encryption key, and then write to the middle of a file. That inode gets its encryption policy set, and that file extent is marked as encrypted. When we go to read back that whole file later we see the unencrypted extents and read those like normal, then get to the encrypted extent and decrypt that like normal. > > > > > What I think you want: > > > > - A clearer deliniation in the fscrypt code of what we do with per-extent > > encryption vs everything else. This is easy for me to grok. > > - A lighter weight per-extent encryption scheme. This is the part that I'm > > having trouble with. I've been reviewing the code from a "is this broken" > > standpoint, because I don't have the expertise in this area to make a sound > > judgement. The btrfs parts are easy, and that code is fine. I trust Sweet > > Tea's judgement to do make decisions that fit our use case, but this seems to > > be the crux of the problem. > > > > This series is the barebones "get fscrypt encrypting stuff on btrfs" approach, > > with followups for the next, hairier bits. > > > > But we're over a year into this process and still stuck, so I'm sitting down to > > understand this code and the current situation in order to provide better > > guidance for Sweet Tea. Please correct me where I've gone wrong, I've been > > going back and reading the emails trying to catch up, so I'm sure I've missed > > things. Thanks, > > The number one thing I'm asking for is something that's maintainable and as easy > to understand as possible. I don't think we've quite reached that yet. > That's fair. Like I said I've purposefully stayed out of this part because it's quite a bit to page in a whole other project into my head and figure out what would be the best course of action. I think that narrowing our usecase to only wanting to support unencrypted->encrypted for snapshots will make this simpler, and integrating your more simplified light weight design for extent encryption will result in something more to your liking. Thanks, Josef