mbox series

[v3,0/9] Support for Casefolding and Encryption

Message ID 20200117214246.235591-1-drosen@google.com (mailing list archive)
Headers show
Series Support for Casefolding and Encryption | expand

Message

Daniel Rosenberg Jan. 17, 2020, 9:42 p.m. UTC
These patches are all on top of fscrypt's developement branch

Ext4 and F2FS currently both support casefolding and encryption, but not at
the same time. These patches aim to rectify that.

Since directory names are stored case preserved, we cannot just take the hash
of the ciphertext. Instead we use the siphash of the casefolded name. With this
we no longer have a direct path from an encrypted name to the hash without the
key. To deal with this, fscrypt now always includes the hash in the name it
presents when the key is not present. There is a pre-existing bug where you can
change parts of the hash and still match the name so long as the disruption to
the hash does not happen to affect lookup on that filesystem. I'm not sure how
to fix that without making ext4 lookups slower in the more common case.

I moved the identical dcache operations for ext4 and f2fs into the VFS, as any
filesystem that uses casefolding will need the same code. This will also allow
further optimizations to that path, although my current changes don't take
advantage of that yet.

For Ext4, this also means that we need to store the hash on disk. We only do so
for encrypted and casefolded directories to avoid on disk format changes.
Previously encryption and casefolding could not live on the same filesystem,
and we're relaxing that requirement. F2fs is a bit more straightforward since
it already stores hashes on disk.

I've updated the related tools with just enough to enable the feature. I still
need to adjust ext4's fsck's, although without access to the keys,
neither fsck will be able to verify the hashes of casefolded and encrypted names.

v3 changes:
fscrypt patch only creates hash key if it will be needed.
Rebased on top of fscrypt branch, reconstified match functions in ext4/f2fs

v2 changes:
fscrypt moved to separate thread to rebase on fscrypt dev branch
addressed feedback, plus some minor fixes


Daniel Rosenberg (9):
  fscrypt: Add siphash and hash key for policy v2
  fscrypt: Don't allow v1 policies with casefolding
  fscrypt: Change format of no-key token
  fscrypt: Only create hash key when needed
  vfs: Fold casefolding into vfs
  f2fs: Handle casefolding with Encryption
  ext4: Use struct super_blocks' casefold data
  ext4: Hande casefolding with encryption
  ext4: Optimize match for casefolded encrypted dirs

 Documentation/filesystems/ext4/directory.rst |  27 ++
 fs/crypto/Kconfig                            |   1 +
 fs/crypto/fname.c                            | 232 ++++++++++---
 fs/crypto/fscrypt_private.h                  |   9 +
 fs/crypto/keysetup.c                         |  35 +-
 fs/crypto/policy.c                           |  53 +++
 fs/dcache.c                                  |  28 ++
 fs/ext4/dir.c                                |  75 +----
 fs/ext4/ext4.h                               |  85 +++--
 fs/ext4/hash.c                               |  26 +-
 fs/ext4/ialloc.c                             |   5 +-
 fs/ext4/inline.c                             |  41 +--
 fs/ext4/namei.c                              | 324 ++++++++++++-------
 fs/ext4/super.c                              |  21 +-
 fs/f2fs/dir.c                                | 112 +++----
 fs/f2fs/f2fs.h                               |  12 +-
 fs/f2fs/hash.c                               |  25 +-
 fs/f2fs/inline.c                             |   9 +-
 fs/f2fs/super.c                              |  17 +-
 fs/f2fs/sysfs.c                              |   8 +-
 fs/inode.c                                   |   3 +-
 fs/namei.c                                   |  41 ++-
 include/linux/fs.h                           |  10 +
 include/linux/fscrypt.h                      |  95 ++----
 include/linux/unicode.h                      |  14 +
 25 files changed, 835 insertions(+), 473 deletions(-)

Comments

Eric Biggers Jan. 20, 2020, 4:52 a.m. UTC | #1
On Fri, Jan 17, 2020 at 01:42:37PM -0800, Daniel Rosenberg wrote:
> These patches are all on top of fscrypt's developement branch
> 
> Ext4 and F2FS currently both support casefolding and encryption, but not at
> the same time. These patches aim to rectify that.
> 
> Since directory names are stored case preserved, we cannot just take the hash
> of the ciphertext. Instead we use the siphash of the casefolded name. With this
> we no longer have a direct path from an encrypted name to the hash without the
> key. To deal with this, fscrypt now always includes the hash in the name it
> presents when the key is not present. There is a pre-existing bug where you can
> change parts of the hash and still match the name so long as the disruption to
> the hash does not happen to affect lookup on that filesystem. I'm not sure how
> to fix that without making ext4 lookups slower in the more common case.
> 
> I moved the identical dcache operations for ext4 and f2fs into the VFS, as any
> filesystem that uses casefolding will need the same code. This will also allow
> further optimizations to that path, although my current changes don't take
> advantage of that yet.
> 
> For Ext4, this also means that we need to store the hash on disk. We only do so
> for encrypted and casefolded directories to avoid on disk format changes.
> Previously encryption and casefolding could not live on the same filesystem,
> and we're relaxing that requirement. F2fs is a bit more straightforward since
> it already stores hashes on disk.
> 
> I've updated the related tools with just enough to enable the feature. I still
> need to adjust ext4's fsck's, although without access to the keys,
> neither fsck will be able to verify the hashes of casefolded and encrypted names.
> 
> v3 changes:
> fscrypt patch only creates hash key if it will be needed.
> Rebased on top of fscrypt branch, reconstified match functions in ext4/f2fs
> 
> v2 changes:
> fscrypt moved to separate thread to rebase on fscrypt dev branch
> addressed feedback, plus some minor fixes
> 
> 
> Daniel Rosenberg (9):
>   fscrypt: Add siphash and hash key for policy v2
>   fscrypt: Don't allow v1 policies with casefolding
>   fscrypt: Change format of no-key token
>   fscrypt: Only create hash key when needed
>   vfs: Fold casefolding into vfs
>   f2fs: Handle casefolding with Encryption
>   ext4: Use struct super_blocks' casefold data
>   ext4: Hande casefolding with encryption
>   ext4: Optimize match for casefolded encrypted dirs

Thanks for the new version of this patchset, Daniel!

I'd like to apply the first four patches (the fs/crypto/ part, to prepare for
the new dirhash method) for 5.6, to get ready for the actual
encrypted+casefolded support in filesystems later.

But we don't have much time left before the merge window, the more I look at the
patches I'm still not very happy with them.  E.g., some comments I made haven't
been addressed, it's missing updates to the documentation, and some of the code
comments and commit messages are still confusing.  For one, there's still some
ambiguity between the dirhash and the SHA-256 hash, and it's not really
explained why the patch introduces the SHA-256 stuff, which actually has nothing
to do with encrypted+casefold (other than it was a good opportunity to do it as
the nokey name format had to be changed for encrypted+casefold anyway).

I also found a bug where the return value of base64_decode() isn't being checked
properly.  We should also keep fscrypt_match_name() simpler by setting disk_name
for short names, like we were before.  There are also some places that count the
padding in struct fscrypt_nokey_name and some that don't, which is confusing.
We also no longer need to call fscrypt_get_policy() during setflags, as we call
fscrypt_require_key() now anyway.  And there's now some ambiguity about what's
meant by a "per-file key", since now there will be 2 types of per-file keys.

So I hope you don't mind, but to move things along I've had a go at cleaning up
the fscrypt patches, and I've sent out an updated version of them.  Can you
please take a look when you have a chance?:
https://lkml.kernel.org/linux-fscrypt/20200120044401.325453-1-ebiggers@kernel.org/T/#u

Thanks!

- Eric
Eric Biggers Jan. 20, 2020, 10:10 p.m. UTC | #2
On Sun, Jan 19, 2020 at 08:52:16PM -0800, Eric Biggers wrote:
> On Fri, Jan 17, 2020 at 01:42:37PM -0800, Daniel Rosenberg wrote:
> > These patches are all on top of fscrypt's developement branch
> > 
> > Ext4 and F2FS currently both support casefolding and encryption, but not at
> > the same time. These patches aim to rectify that.
> > 
> > Since directory names are stored case preserved, we cannot just take the hash
> > of the ciphertext. Instead we use the siphash of the casefolded name. With this
> > we no longer have a direct path from an encrypted name to the hash without the
> > key. To deal with this, fscrypt now always includes the hash in the name it
> > presents when the key is not present. There is a pre-existing bug where you can
> > change parts of the hash and still match the name so long as the disruption to
> > the hash does not happen to affect lookup on that filesystem. I'm not sure how
> > to fix that without making ext4 lookups slower in the more common case.
> > 
> > I moved the identical dcache operations for ext4 and f2fs into the VFS, as any
> > filesystem that uses casefolding will need the same code. This will also allow
> > further optimizations to that path, although my current changes don't take
> > advantage of that yet.
> > 
> > For Ext4, this also means that we need to store the hash on disk. We only do so
> > for encrypted and casefolded directories to avoid on disk format changes.
> > Previously encryption and casefolding could not live on the same filesystem,
> > and we're relaxing that requirement. F2fs is a bit more straightforward since
> > it already stores hashes on disk.
> > 
> > I've updated the related tools with just enough to enable the feature. I still
> > need to adjust ext4's fsck's, although without access to the keys,
> > neither fsck will be able to verify the hashes of casefolded and encrypted names.
> > 
> > v3 changes:
> > fscrypt patch only creates hash key if it will be needed.
> > Rebased on top of fscrypt branch, reconstified match functions in ext4/f2fs
> > 
> > v2 changes:
> > fscrypt moved to separate thread to rebase on fscrypt dev branch
> > addressed feedback, plus some minor fixes
> > 
> > 
> > Daniel Rosenberg (9):
> >   fscrypt: Add siphash and hash key for policy v2
> >   fscrypt: Don't allow v1 policies with casefolding
> >   fscrypt: Change format of no-key token
> >   fscrypt: Only create hash key when needed
> >   vfs: Fold casefolding into vfs
> >   f2fs: Handle casefolding with Encryption
> >   ext4: Use struct super_blocks' casefold data
> >   ext4: Hande casefolding with encryption
> >   ext4: Optimize match for casefolded encrypted dirs
> 
> Thanks for the new version of this patchset, Daniel!
> 
> I'd like to apply the first four patches (the fs/crypto/ part, to prepare for
> the new dirhash method) for 5.6, to get ready for the actual
> encrypted+casefolded support in filesystems later.
> 
> But we don't have much time left before the merge window, the more I look at the
> patches I'm still not very happy with them.  E.g., some comments I made haven't
> been addressed, it's missing updates to the documentation, and some of the code
> comments and commit messages are still confusing.  For one, there's still some
> ambiguity between the dirhash and the SHA-256 hash, and it's not really
> explained why the patch introduces the SHA-256 stuff, which actually has nothing
> to do with encrypted+casefold (other than it was a good opportunity to do it as
> the nokey name format had to be changed for encrypted+casefold anyway).
> 
> I also found a bug where the return value of base64_decode() isn't being checked
> properly.  We should also keep fscrypt_match_name() simpler by setting disk_name
> for short names, like we were before.  There are also some places that count the
> padding in struct fscrypt_nokey_name and some that don't, which is confusing.
> We also no longer need to call fscrypt_get_policy() during setflags, as we call
> fscrypt_require_key() now anyway.  And there's now some ambiguity about what's
> meant by a "per-file key", since now there will be 2 types of per-file keys.
> 
> So I hope you don't mind, but to move things along I've had a go at cleaning up
> the fscrypt patches, and I've sent out an updated version of them.  Can you
> please take a look when you have a chance?:
> https://lkml.kernel.org/linux-fscrypt/20200120044401.325453-1-ebiggers@kernel.org/T/#u
> 

The new fscrypt no-key name format also breaks UBIFS encryption.  So we'll need
a couple UBIFS fixes too.  I'll send out a new series that includes them.

- Eric