diff mbox series

[1/4] tune2fs: prevent changing UUID of fs with stable_inodes feature

Message ID 20200401203239.163679-2-ebiggers@kernel.org (mailing list archive)
State Accepted
Headers show
Series e2fsprogs: fix and document the stable_inodes feature | expand

Commit Message

Eric Biggers April 1, 2020, 8:32 p.m. UTC
From: Eric Biggers <ebiggers@google.com>

The stable_inodes feature is intended to indicate that it's safe to use
IV_INO_LBLK_64 encryption policies, where the encryption depends on the
inode numbers and thus filesystem shrinking is not allowed.  However
since inode numbers are not unique across filesystems, the encryption
also depends on the filesystem UUID, and I missed that there is a
supported way to change the filesystem UUID (tune2fs -U).

So, make 'tune2fs -U' report an error if stable_inodes is set.

We could add a separate stable_uuid feature flag, but it seems unlikely
it would be useful enough on its own to warrant another flag.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 misc/tune2fs.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Andreas Dilger April 2, 2020, 2:19 a.m. UTC | #1
On Apr 1, 2020, at 2:32 PM, Eric Biggers <ebiggers@kernel.org> wrote:
> 
> From: Eric Biggers <ebiggers@google.com>
> 
> The stable_inodes feature is intended to indicate that it's safe to use
> IV_INO_LBLK_64 encryption policies, where the encryption depends on the
> inode numbers and thus filesystem shrinking is not allowed.  However
> since inode numbers are not unique across filesystems, the encryption
> also depends on the filesystem UUID, and I missed that there is a
> supported way to change the filesystem UUID (tune2fs -U).
> 
> So, make 'tune2fs -U' report an error if stable_inodes is set.
> 
> We could add a separate stable_uuid feature flag, but it seems unlikely
> it would be useful enough on its own to warrant another flag.

What about having tune2fs walk the inode table checking for any inodes that
have this flag, and only refusing to clear the flag if it finds any?  That
takes some time on very large filesystems, but since inode table reading is
linear it is reasonable on most filesystems.

Cheers, Andreas

> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
> misc/tune2fs.c | 7 +++++++
> 1 file changed, 7 insertions(+)
> 
> diff --git a/misc/tune2fs.c b/misc/tune2fs.c
> index 314cc0d0..ca06c98b 100644
> --- a/misc/tune2fs.c
> +++ b/misc/tune2fs.c
> @@ -3236,6 +3236,13 @@ _("Warning: The journal is dirty. You may wish to replay the journal like:\n\n"
> 		char buf[SUPERBLOCK_SIZE] __attribute__ ((aligned(8)));
> 		__u8 old_uuid[UUID_SIZE];
> 
> +		if (ext2fs_has_feature_stable_inodes(fs->super)) {
> +			fputs(_("Cannot change the UUID of this filesystem "
> +				"because it has the stable_inodes feature "
> +				"flag.\n"), stderr);
> +			exit(1);
> +		}
> +
> 		if (!ext2fs_has_feature_csum_seed(fs->super) &&
> 		    (ext2fs_has_feature_metadata_csum(fs->super) ||
> 		     ext2fs_has_feature_ea_inode(fs->super))) {
> --
> 2.26.0.rc2.310.g2932bb562d-goog
> 


Cheers, Andreas
Eric Biggers April 7, 2020, 5:32 a.m. UTC | #2
On Wed, Apr 01, 2020 at 08:19:38PM -0600, Andreas Dilger wrote:
> On Apr 1, 2020, at 2:32 PM, Eric Biggers <ebiggers@kernel.org> wrote:
> > 
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > The stable_inodes feature is intended to indicate that it's safe to use
> > IV_INO_LBLK_64 encryption policies, where the encryption depends on the
> > inode numbers and thus filesystem shrinking is not allowed.  However
> > since inode numbers are not unique across filesystems, the encryption
> > also depends on the filesystem UUID, and I missed that there is a
> > supported way to change the filesystem UUID (tune2fs -U).
> > 
> > So, make 'tune2fs -U' report an error if stable_inodes is set.
> > 
> > We could add a separate stable_uuid feature flag, but it seems unlikely
> > it would be useful enough on its own to warrant another flag.
> 
> What about having tune2fs walk the inode table checking for any inodes that
> have this flag, and only refusing to clear the flag if it finds any?  That
> takes some time on very large filesystems, but since inode table reading is
> linear it is reasonable on most filesystems.
> 

I assume you meant to make this comment on patch 2,
"tune2fs: prevent stable_inodes feature from being cleared"?

It's a good suggestion, but it also applies equally to the encrypt, verity,
extents, and ea_inode features.  Currently tune2fs can't clear any of these,
since any inode might be using them.

Note that it would actually be slightly harder to implement your suggestion for
stable_inodes than those four existing features, since clearing stable_inodes
would require reading xattrs rather than just the inode flags.

So if I have time, I can certainly look into allowing tune2fs to clear the
encrypt, verity, extents, stable_inodes, and ea_inode features, by doing an
inode table scan to verify that it's safe.  IMO it doesn't make sense to hold up
this patch on it, though.  This patch just makes stable_inodes work like other
ext4 features.

- Eric
Andreas Dilger April 7, 2020, 4:18 p.m. UTC | #3
> On Apr 6, 2020, at 11:32 PM, Eric Biggers <ebiggers@kernel.org> wrote:
> 
> On Wed, Apr 01, 2020 at 08:19:38PM -0600, Andreas Dilger wrote:
>> On Apr 1, 2020, at 2:32 PM, Eric Biggers <ebiggers@kernel.org> wrote:
>>> 
>>> From: Eric Biggers <ebiggers@google.com>
>>> 
>>> The stable_inodes feature is intended to indicate that it's safe to use
>>> IV_INO_LBLK_64 encryption policies, where the encryption depends on the
>>> inode numbers and thus filesystem shrinking is not allowed.  However
>>> since inode numbers are not unique across filesystems, the encryption
>>> also depends on the filesystem UUID, and I missed that there is a
>>> supported way to change the filesystem UUID (tune2fs -U).
>>> 
>>> So, make 'tune2fs -U' report an error if stable_inodes is set.
>>> 
>>> We could add a separate stable_uuid feature flag, but it seems unlikely
>>> it would be useful enough on its own to warrant another flag.
>> 
>> What about having tune2fs walk the inode table checking for any inodes that
>> have this flag, and only refusing to clear the flag if it finds any?  That
>> takes some time on very large filesystems, but since inode table reading is
>> linear it is reasonable on most filesystems.
> 
> I assume you meant to make this comment on patch 2,
> "tune2fs: prevent stable_inodes feature from being cleared"?
> 
> It's a good suggestion, but it also applies equally to the encrypt, verity,
> extents, and ea_inode features.  Currently tune2fs can't clear any of these,
> since any inode might be using them.
> 
> Note that it would actually be slightly harder to implement your suggestion for
> stable_inodes than those four existing features, since clearing stable_inodes
> would require reading xattrs rather than just the inode flags.
> 
> So if I have time, I can certainly look into allowing tune2fs to clear the
> encrypt, verity, extents, stable_inodes, and ea_inode features, by doing an
> inode table scan to verify that it's safe.  IMO it doesn't make sense to hold up
> this patch on it, though.  This patch just makes stable_inodes work like other
> ext4 features.

Sure, I'm OK with this patch, since it avoids accidental breakage.

One question though - for the data checksums it uses s_checksum_seed to generate
checksums, rather than directly using the UUID itself, so that it *is* possible
to change the filesystem UUID after metadata_csum is in use, without the need
to rewrite all of the checksums in the filesystem.  Could the same be done for
stable_inode?

Cheers, Andreas
Eric Biggers April 8, 2020, 3:11 a.m. UTC | #4
On Tue, Apr 07, 2020 at 10:18:55AM -0600, Andreas Dilger wrote:
> 
> > On Apr 6, 2020, at 11:32 PM, Eric Biggers <ebiggers@kernel.org> wrote:
> > 
> > On Wed, Apr 01, 2020 at 08:19:38PM -0600, Andreas Dilger wrote:
> >> On Apr 1, 2020, at 2:32 PM, Eric Biggers <ebiggers@kernel.org> wrote:
> >>> 
> >>> From: Eric Biggers <ebiggers@google.com>
> >>> 
> >>> The stable_inodes feature is intended to indicate that it's safe to use
> >>> IV_INO_LBLK_64 encryption policies, where the encryption depends on the
> >>> inode numbers and thus filesystem shrinking is not allowed.  However
> >>> since inode numbers are not unique across filesystems, the encryption
> >>> also depends on the filesystem UUID, and I missed that there is a
> >>> supported way to change the filesystem UUID (tune2fs -U).
> >>> 
> >>> So, make 'tune2fs -U' report an error if stable_inodes is set.
> >>> 
> >>> We could add a separate stable_uuid feature flag, but it seems unlikely
> >>> it would be useful enough on its own to warrant another flag.
> >> 
> >> What about having tune2fs walk the inode table checking for any inodes that
> >> have this flag, and only refusing to clear the flag if it finds any?  That
> >> takes some time on very large filesystems, but since inode table reading is
> >> linear it is reasonable on most filesystems.
> > 
> > I assume you meant to make this comment on patch 2,
> > "tune2fs: prevent stable_inodes feature from being cleared"?
> > 
> > It's a good suggestion, but it also applies equally to the encrypt, verity,
> > extents, and ea_inode features.  Currently tune2fs can't clear any of these,
> > since any inode might be using them.
> > 
> > Note that it would actually be slightly harder to implement your suggestion for
> > stable_inodes than those four existing features, since clearing stable_inodes
> > would require reading xattrs rather than just the inode flags.
> > 
> > So if I have time, I can certainly look into allowing tune2fs to clear the
> > encrypt, verity, extents, stable_inodes, and ea_inode features, by doing an
> > inode table scan to verify that it's safe.  IMO it doesn't make sense to hold up
> > this patch on it, though.  This patch just makes stable_inodes work like other
> > ext4 features.
> 
> Sure, I'm OK with this patch, since it avoids accidental breakage.
> 
> One question though - for the data checksums it uses s_checksum_seed to generate
> checksums, rather than directly using the UUID itself, so that it *is* possible
> to change the filesystem UUID after metadata_csum is in use, without the need
> to rewrite all of the checksums in the filesystem.  Could the same be done for
> stable_inode?
> 

We could have used s_encrypt_pw_salt, but from a cryptographic perspective I
feel a bit safer using the UUID.  ext4 metadata checksums are non-cryptographic
and for integrity-only, so it's not disastrous if multiple filesystems share the
same s_checksum_seed.  So EXT4_FEATURE_INCOMPAT_CSUM_SEED makes sense as a
usability improvement for people doing things with filesystem cloning.

The new inode-number based encryption is a bit different since it may (depending
on how userspace chooses keys) depend on the per-filesystem ID for cryptographic
purposes.  So it can be much more important that these IDs are really unique.

On this basis, the UUID seems like a better choice since people doing things
with filesystem cloning are more likely to remember to set up the UUIDs as
unique, vs. some "second UUID" that's more hidden and would be forgotten about.

Using s_encrypt_pw_salt would also have been a bit more complex, as we'd have
had to add fscrypt_operations to retrieve it rather than just using s_uuid --
remembering to generate it if unset (mke2fs doesn't set it).  We'd also have
wanted to rename it to something else like s_encrypt_uuid to avoid confusion as
it would no longer be just a password salt.

Anyway, we couldn't really change this now even if we wanted to, since
IV_INO_LBLK_64 encryption policies were already released in v5.5.

- Eric
Andreas Dilger April 10, 2020, 11:53 a.m. UTC | #5
On Apr 7, 2020, at 9:11 PM, Eric Biggers <ebiggers@kernel.org> wrote:
> 
> On Tue, Apr 07, 2020 at 10:18:55AM -0600, Andreas Dilger wrote:
>> 
>> One question though - for the data checksums it uses s_checksum_seed
>> to generate checksums, rather than directly using the UUID itself,
>> so that it *is* possible to change the filesystem UUID after
>> metadata_csum is in use, without the need to rewrite all of the
>> checksums in the filesystem.  Could the same be done for stable_inode?
> 
> We could have used s_encrypt_pw_salt, but from a cryptographic perspective I
> feel a bit safer using the UUID.  ext4 metadata checksums are non-cryptographic
> and for integrity-only, so it's not disastrous if multiple filesystems share the
> same s_checksum_seed.  So EXT4_FEATURE_INCOMPAT_CSUM_SEED makes sense as a
> usability improvement for people doing things with filesystem cloning.
> 
> The new inode-number based encryption is a bit different since it may (depending
> on how userspace chooses keys) depend on the per-filesystem ID for cryptographic
> purposes.  So it can be much more important that these IDs are really unique.
> 
> On this basis, the UUID seems like a better choice since people doing things
> with filesystem cloning are more likely to remember to set up the UUIDs as
> unique, vs. some "second UUID" that's more hidden and would be forgotten about.

Actually, I think the opposite is true here.  To avoid usability problems,
users *have* to change the UUID of a cloned/snapshot filesystem to avoid
problems with mount-by-UUID (e.g. either filesystem may be mounted randomly
on each boot, depending on the device enumeration order).  However, if they
try to change the UUID, that would immediately break all of the encrypted
files in the filesystem, so that means with the stable_inode feature either:
- a snapshot/clone of a filesystem may subtly break your system, or
- you can't keep a snapshot/clone of such a filesystem on the same node

> Using s_encrypt_pw_salt would also have been a bit more complex, as we'd have
> had to add fscrypt_operations to retrieve it rather than just using s_uuid --
> remembering to generate it if unset (mke2fs doesn't set it).  We'd also have
> wanted to rename it to something else like s_encrypt_uuid to avoid confusion as
> it would no longer be just a password salt.
> 
> Anyway, we couldn't really change this now even if we wanted to, since
> IV_INO_LBLK_64 encryption policies were already released in v5.5.

I'm not sure I buy these arguments...  We changed handling of metadata_csum
after the fact, by checking at mount if s_checksum_seed is initialized,
otherwise hashing s_uuid and storing if it is zero.  Storing s_checksum_seed
proactively in the kernel and e2fsck allows users to change s_uuid if they
have a new enough kernel without noticing that the checksums were originally
based on s_uuid rather than the hash of it in s_checksum_seed.

I'm not sure of the details of whether s_encrypt_pw_salt is used in the
IV_INO_LBLK_64 case or not (since it uses inode/block number as the salt?),
but I see that the code is already initializing s_encrypt_pw_salt in the
kernel if unset, so that is not hard to do.  It could just make a copy from
s_uuid rather than generating a new UUID for s_encrypt_pw_salt, or for new
filesystems it can generate a unique s_encrypt_pw_salt and only use that?

Storing a feature flag to indicate whether s_uuid or s_encrypt_pw_salt is
used for the IV_INO_LBLK_64 case seems pretty straight forward?  Maybe any
filesystems that are using IV_INO_LBLK_64 with s_uuid can't change the UUID,
but a few bits and lines of code could allow any new filesystem to do so?
If you consider that 5.5 has been out for a few months, there aren't going
to be a lot of users of that approach, vs. the next 10 years or more.

In the end, you are the guy who has to deal with issues here, so I leave it
to you.  I just think it is a problem waiting to happen, and preventing the
users from shooting themselves in the foot with tune2fs doesn't mean that
they won't have significant problems later that could easily be solved now.

Cheers, Andreas
Theodore Ts'o April 10, 2020, 3:06 p.m. UTC | #6
On Fri, Apr 10, 2020 at 05:53:54AM -0600, Andreas Dilger wrote:
> 
> Actually, I think the opposite is true here.  To avoid usability problems,
> users *have* to change the UUID of a cloned/snapshot filesystem to avoid
> problems with mount-by-UUID (e.g. either filesystem may be mounted randomly
> on each boot, depending on the device enumeration order).  However, if they
> try to change the UUID, that would immediately break all of the encrypted
> files in the filesystem, so that means with the stable_inode feature either:
> - a snapshot/clone of a filesystem may subtly break your system, or
> - you can't keep a snapshot/clone of such a filesystem on the same node

I don't think there is any reason why we would use IV_INO_LBLK_64 mode
on anything other than tablet/mobile devices using the latest UFS or
eMMC standards which support in-line crypto engines (ICE).  I'm not
aware of any cloud VM's, private or public, which supports ICE.  And
even if they did, hopefully they would use something more sane than
the UFS/eMMC spec, which only supports 64 bits of IV per I/O request,
and only support a small number of keys that can be loaded into the
hardware.  (This is what you get when you are optimizing Bill of
Materials costs down to a tenth of a cent; a million devices here,
retail store profit margins there, and before you know it you're
talking real money...)

Furthermore, on an modern x86_64, you can do AES encryption at less
than a cycle per CPU clock cycle, and in cloud VM's, battery life is
not a concern, so there really isn't any reason to use or implement
ICE, except maybe as a testing vehicle for fscrypt (e.g., someone
wanting to implement UFS 2.1 in qemu to make it easier to test the
Linux kernel's ICE support).

       	           	       	       	      	      - Ted
Eric Biggers April 10, 2020, 4:30 p.m. UTC | #7
On Fri, Apr 10, 2020 at 05:53:54AM -0600, Andreas Dilger wrote:
> On Apr 7, 2020, at 9:11 PM, Eric Biggers <ebiggers@kernel.org> wrote:
> > 
> > On Tue, Apr 07, 2020 at 10:18:55AM -0600, Andreas Dilger wrote:
> >> 
> >> One question though - for the data checksums it uses s_checksum_seed
> >> to generate checksums, rather than directly using the UUID itself,
> >> so that it *is* possible to change the filesystem UUID after
> >> metadata_csum is in use, without the need to rewrite all of the
> >> checksums in the filesystem.  Could the same be done for stable_inode?
> > 
> > We could have used s_encrypt_pw_salt, but from a cryptographic perspective I
> > feel a bit safer using the UUID.  ext4 metadata checksums are non-cryptographic
> > and for integrity-only, so it's not disastrous if multiple filesystems share the
> > same s_checksum_seed.  So EXT4_FEATURE_INCOMPAT_CSUM_SEED makes sense as a
> > usability improvement for people doing things with filesystem cloning.
> > 
> > The new inode-number based encryption is a bit different since it may (depending
> > on how userspace chooses keys) depend on the per-filesystem ID for cryptographic
> > purposes.  So it can be much more important that these IDs are really unique.
> > 
> > On this basis, the UUID seems like a better choice since people doing things
> > with filesystem cloning are more likely to remember to set up the UUIDs as
> > unique, vs. some "second UUID" that's more hidden and would be forgotten about.
> 
> Actually, I think the opposite is true here.  To avoid usability problems,
> users *have* to change the UUID of a cloned/snapshot filesystem to avoid
> problems with mount-by-UUID (e.g. either filesystem may be mounted randomly
> on each boot, depending on the device enumeration order).  However, if they
> try to change the UUID, that would immediately break all of the encrypted
> files in the filesystem, so that means with the stable_inode feature either:
> - a snapshot/clone of a filesystem may subtly break your system, or
> - you can't keep a snapshot/clone of such a filesystem on the same node

My concern is about security, not usability.  

If the filesystem IDs used to derive keys for inode-number based encryption
aren't unique, then ciphertext may be repeated across files.  Users wouldn't
notice this since it would be a silent bug, but it would be a cryptographic
vulnerability.  Systems need to be designed in such a way that silent
cryptographic vulnerabilities can't occur, or at least are less likely.

Using the actual UUID rather than a hidden second field encourages people to
keep the IDs unique and is simpler.

The existence of s_encrypt_pw_salt does provide some precedent to use a hidden
second field, but not much since that's intended to be used with per-file keys.
With inode-number based encryption, filesystem ID reuse is a greater concern.

One could validly argue that this is "just a theoretical issue" at the moment,
due to the limited systems on which inode-number based encryption would actually
be used (as Ted and I have described).  But the usability concerns are likewise
theoretical at the moment, for the same reason.

> 
> > Using s_encrypt_pw_salt would also have been a bit more complex, as we'd have
> > had to add fscrypt_operations to retrieve it rather than just using s_uuid --
> > remembering to generate it if unset (mke2fs doesn't set it).  We'd also have
> > wanted to rename it to something else like s_encrypt_uuid to avoid confusion as
> > it would no longer be just a password salt.
> > 
> > Anyway, we couldn't really change this now even if we wanted to, since
> > IV_INO_LBLK_64 encryption policies were already released in v5.5.
> 
> I'm not sure I buy these arguments...  We changed handling of metadata_csum
> after the fact, by checking at mount if s_checksum_seed is initialized,
> otherwise hashing s_uuid and storing if it is zero.  Storing s_checksum_seed
> proactively in the kernel and e2fsck allows users to change s_uuid if they
> have a new enough kernel without noticing that the checksums were originally
> based on s_uuid rather than the hash of it in s_checksum_seed.
> 
> I'm not sure of the details of whether s_encrypt_pw_salt is used in the
> IV_INO_LBLK_64 case or not (since it uses inode/block number as the salt?),
> but I see that the code is already initializing s_encrypt_pw_salt in the
> kernel if unset, so that is not hard to do.  It could just make a copy from
> s_uuid rather than generating a new UUID for s_encrypt_pw_salt, or for new
> filesystems it can generate a unique s_encrypt_pw_salt and only use that?
> 
> Storing a feature flag to indicate whether s_uuid or s_encrypt_pw_salt is
> used for the IV_INO_LBLK_64 case seems pretty straight forward?  Maybe any
> filesystems that are using IV_INO_LBLK_64 with s_uuid can't change the UUID,
> but a few bits and lines of code could allow any new filesystem to do so?
> If you consider that 5.5 has been out for a few months, there aren't going
> to be a lot of users of that approach, vs. the next 10 years or more.
> 
> In the end, you are the guy who has to deal with issues here, so I leave it
> to you.  I just think it is a problem waiting to happen, and preventing the
> users from shooting themselves in the foot with tune2fs doesn't mean that
> they won't have significant problems later that could easily be solved now.
> 

Sure, it wouldn't be *that* hard to add support for using s_encrypt_pw_salt
using a separate filesystem feature flag.  And it could be done after the fact.
My points are just that it would add *some* extra complexity, we already
implemented another approach that is fine for the users who would actually use
it, and the approach we implemented is more cryptographically robust so
switching to the other way wouldn't necessarily be an improvement.

- Eric
diff mbox series

Patch

diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index 314cc0d0..ca06c98b 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -3236,6 +3236,13 @@  _("Warning: The journal is dirty. You may wish to replay the journal like:\n\n"
 		char buf[SUPERBLOCK_SIZE] __attribute__ ((aligned(8)));
 		__u8 old_uuid[UUID_SIZE];
 
+		if (ext2fs_has_feature_stable_inodes(fs->super)) {
+			fputs(_("Cannot change the UUID of this filesystem "
+				"because it has the stable_inodes feature "
+				"flag.\n"), stderr);
+			exit(1);
+		}
+
 		if (!ext2fs_has_feature_csum_seed(fs->super) &&
 		    (ext2fs_has_feature_metadata_csum(fs->super) ||
 		     ext2fs_has_feature_ea_inode(fs->super))) {