[v7,07/16] fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl
diff mbox series

Message ID 20190726224141.14044-8-ebiggers@kernel.org
State Superseded
Headers show
Series
  • fscrypt: key management improvements
Related show

Commit Message

Eric Biggers July 26, 2019, 10:41 p.m. UTC
From: Eric Biggers <ebiggers@google.com>

Add a new fscrypt ioctl, FS_IOC_REMOVE_ENCRYPTION_KEY.  This ioctl
removes an encryption key that was added by FS_IOC_ADD_ENCRYPTION_KEY.
It wipes the secret key itself, then "locks" the encrypted files and
directories that had been unlocked using that key -- implemented by
evicting the relevant dentries and inodes from the VFS caches.

The problem this solves is that many fscrypt users want the ability to
remove encryption keys, causing the corresponding encrypted directories
to appear "locked" (presented in ciphertext form) again.  Moreover,
users want removing an encryption key to *really* remove it, in the
sense that the removed keys cannot be recovered even if kernel memory is
compromised, e.g. by the exploit of a kernel security vulnerability or
by a physical attack.  This is desirable after a user logs out of the
system, for example.  In many cases users even already assume this to be
the case and are surprised to hear when it's not.

It is not sufficient to simply unlink the master key from the keyring
(or to revoke or invalidate it), since the actual encryption transform
objects are still pinned in memory by their inodes.  Therefore, to
really remove a key we must also evict the relevant inodes.

Currently one workaround is to run 'sync && echo 2 >
/proc/sys/vm/drop_caches'.  But, that evicts all unused inodes in the
system rather than just the inodes associated with the key being
removed, causing severe performance problems.  Moreover, it requires
root privileges, so regular users can't "lock" their encrypted files.

Another workaround, used in Chromium OS kernels, is to add a new
VFS-level ioctl FS_IOC_DROP_CACHE which is a more restricted version of
drop_caches that operates on a single super_block.  It does:

        shrink_dcache_sb(sb);
        invalidate_inodes(sb, false);

But it's still a hack.  Yet, the major users of filesystem encryption
want this feature badly enough that they are actually using these hacks.

To properly solve the problem, start maintaining a list of the inodes
which have been "unlocked" using each master key.  Originally this
wasn't possible because the kernel didn't keep track of in-use master
keys at all.  But, with the ->s_master_keys keyring it is now possible.

Then, add an ioctl FS_IOC_REMOVE_ENCRYPTION_KEY.  It finds the specified
master key in ->s_master_keys, then wipes the secret key itself, which
prevents any additional inodes from being unlocked with the key.  Then,
it syncs the filesystem and evicts the inodes in the key's list.  The
normal inode eviction code will free and wipe the per-file keys (in
->i_crypt_info).  Note that freeing ->i_crypt_info without evicting the
inodes was also considered, but would have been racy.

Some inodes may still be in use when a master key is removed, and we
can't simply revoke random file descriptors, mmap's, etc.  Thus, the
ioctl simply skips in-use inodes, and returns -EBUSY to indicate that
some inodes weren't evicted.  The master key *secret* is still removed,
but the fscrypt_master_key struct remains to keep track of the remaining
inodes.  Userspace can then retry the ioctl to evict the remaining
inodes.  Alternatively, if userspace adds the key again, the refreshed
secret will be associated with the existing list of inodes so they
remain correctly tracked for future key removals.

The ioctl doesn't wipe pagecache pages.  Thus, we tolerate that after a
kernel compromise some portions of plaintext file contents may still be
recoverable from memory.  This can be solved by enabling page poisoning
system-wide, which security conscious users may choose to do.  But it's
very difficult to solve otherwise, e.g. note that plaintext file
contents may have been read in other places than pagecache pages.

Like FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY is
initially restricted to privileged users only.  This is sufficient for
some use cases, but not all.  A later patch will relax this restriction,
but it will require introducing key hashes, among other changes.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/crypto/fscrypt_private.h  |  53 +++++++-
 fs/crypto/keyring.c          | 247 ++++++++++++++++++++++++++++++++++-
 fs/crypto/keysetup.c         | 103 ++++++++++++++-
 include/linux/fscrypt.h      |  13 ++
 include/uapi/linux/fscrypt.h |   7 +
 5 files changed, 418 insertions(+), 5 deletions(-)

Comments

Theodore Y. Ts'o July 28, 2019, 7:24 p.m. UTC | #1
On Fri, Jul 26, 2019 at 03:41:32PM -0700, Eric Biggers wrote:
> +	fscrypt_warn(NULL,
> +		     "%s: %zu inodes still busy after removing key with description %*phN, including ino %lu (%s)",

nit: s/inodes/inode(s)/

> +
> +/*
> + * Try to remove an fscrypt master encryption key.  If other users have also
> + * added the key, we'll remove the current user's usage of the key, then return
> + * -EUSERS.  Otherwise we'll continue on and try to actually remove the key.

Nit: this should be moved to patch #11

Also, perror(EUSERS) will display "Too many users" which is going to
be confusing.  I understand why you chose this; we would like to
distinguish between there are still inodes using this key, and there
are other users using this key.

Do we really need to return EUSERS in this case?  It's actually not an
*error* that other users are using the key.  After all, the unlink(2)
system call doesn't return an advisory error when you delete a file
which has other hard links.  And an application which does care about
this detail can always call FS_IOC_ENCRYPTION_KEY_STATUS() and check
user_count.

Other than these nits, looks good.  Feel free to add:

Reviewed-by: Theodore Ts'o <tytso@mit.edu>

						- Ted
Eric Biggers July 29, 2019, 7:58 p.m. UTC | #2
On Sun, Jul 28, 2019 at 03:24:17PM -0400, Theodore Y. Ts'o wrote:
> > +
> > +/*
> > + * Try to remove an fscrypt master encryption key.  If other users have also
> > + * added the key, we'll remove the current user's usage of the key, then return
> > + * -EUSERS.  Otherwise we'll continue on and try to actually remove the key.
> 
> Nit: this should be moved to patch #11
> 
> Also, perror(EUSERS) will display "Too many users" which is going to
> be confusing.  I understand why you chose this; we would like to
> distinguish between there are still inodes using this key, and there
> are other users using this key.
> 
> Do we really need to return EUSERS in this case?  It's actually not an
> *error* that other users are using the key.  After all, the unlink(2)
> system call doesn't return an advisory error when you delete a file
> which has other hard links.  And an application which does care about
> this detail can always call FS_IOC_ENCRYPTION_KEY_STATUS() and check
> user_count.
> 

Returning 0 when the key wasn't fully removed might also be confusing.  But I
guess you're right that returning an error doesn't match how syscalls usually
work.  It did remove the current user's usage of the key, after all, rather than
completely fail.  And as you point out, if someone cares about other users
having added the key, they can use FS_IOC_GET_ENCRYPTION_KEY_STATUS.

So I guess I'll change it to 0.

Thanks!

- Eric
Eric Biggers July 31, 2019, 6:38 p.m. UTC | #3
On Mon, Jul 29, 2019 at 12:58:28PM -0700, Eric Biggers wrote:
> On Sun, Jul 28, 2019 at 03:24:17PM -0400, Theodore Y. Ts'o wrote:
> > > +
> > > +/*
> > > + * Try to remove an fscrypt master encryption key.  If other users have also
> > > + * added the key, we'll remove the current user's usage of the key, then return
> > > + * -EUSERS.  Otherwise we'll continue on and try to actually remove the key.
> > 
> > Nit: this should be moved to patch #11
> > 
> > Also, perror(EUSERS) will display "Too many users" which is going to
> > be confusing.  I understand why you chose this; we would like to
> > distinguish between there are still inodes using this key, and there
> > are other users using this key.
> > 
> > Do we really need to return EUSERS in this case?  It's actually not an
> > *error* that other users are using the key.  After all, the unlink(2)
> > system call doesn't return an advisory error when you delete a file
> > which has other hard links.  And an application which does care about
> > this detail can always call FS_IOC_ENCRYPTION_KEY_STATUS() and check
> > user_count.
> > 
> 
> Returning 0 when the key wasn't fully removed might also be confusing.  But I
> guess you're right that returning an error doesn't match how syscalls usually
> work.  It did remove the current user's usage of the key, after all, rather than
> completely fail.  And as you point out, if someone cares about other users
> having added the key, they can use FS_IOC_GET_ENCRYPTION_KEY_STATUS.
> 
> So I guess I'll change it to 0.
> 

So after making this change and thinking about it some more, I'm not sure it's
actually an improvement.

The normal use case for this ioctl is to "lock" some encrypted directory(s).  If
it returns 0 and doesn't lock the directory(s), that's unexpected.

This is perhaps different from what users expect from unlink().  It's well known
that unlink() just deletes the filename, not the file itself if it's still open
or has other links.  And unlink() by itself isn't meant for use cases where the
file absolutely must be securely erased.  But FS_IOC_REMOVE_ENCRYPTION_KEY
really is meant primarily for that sort of thing.

To give a concrete example: my patch for the userspace tool
https://github.com/google/fscrypt adds a command 'fscrypt lock' which locks an
encrypted directory.  If, say, someone runs 'fscrypt unlock' as uid 0 and then
'fscrypt lock' as uid 1000, then FS_IOC_REMOVE_ENCRYPTION_KEY can't actually
remove the key.  I need to make the tool show a proper error message in this
case.  To do so, it would help to get a unique error code (e.g. EUSERS) from
FS_IOC_REMOVE_ENCRYPTION_KEY, rather than get the ambiguous error code ENOKEY
and have to call FS_IOC_GET_ENCRYPTION_KEY_STATUS to get the real status.

Also, we already have the EBUSY case.  This means that the ioctl removed the
master key secret itself; however, some files were still in-use, so the key
remains in the "incompletely removed" state.  If we were actually going for
unlink() semantics, then for consistency this case really ought to return 0 and
unlink the key object, and people who care about in-use files would need to use
FS_IOC_GET_ENCRYPTION_KEY_STATUS.  But most people *will* care about this, and
may even want to retry the ioctl later, which isn't something you can do with
pure unlink() semantics.

So I'm leaning towards keeping the EUSERS and EBUSY errors.

- Eric
Theodore Y. Ts'o July 31, 2019, 11:38 p.m. UTC | #4
On Wed, Jul 31, 2019 at 11:38:02AM -0700, Eric Biggers wrote:
> 
> This is perhaps different from what users expect from unlink().  It's well known
> that unlink() just deletes the filename, not the file itself if it's still open
> or has other links.  And unlink() by itself isn't meant for use cases where the
> file absolutely must be securely erased.  But FS_IOC_REMOVE_ENCRYPTION_KEY
> really is meant primarily for that sort of thing.

Seems to me that part of the confusion is FS_IOC_REMOVE_ENCRYPTION_KEY
does two things.  One is "remove the user's handle on the key".  The
other is "purge all keys" (which requires root).  So it does two
different things with one ioctl.

> To give a concrete example: my patch for the userspace tool
> https://github.com/google/fscrypt adds a command 'fscrypt lock' which locks an
> encrypted directory.  If, say, someone runs 'fscrypt unlock' as uid 0 and then
> 'fscrypt lock' as uid 1000, then FS_IOC_REMOVE_ENCRYPTION_KEY can't actually
> remove the key.  I need to make the tool show a proper error message in this
> case.  To do so, it would help to get a unique error code (e.g. EUSERS) from
> FS_IOC_REMOVE_ENCRYPTION_KEY, rather than get the ambiguous error code ENOKEY
> and have to call FS_IOC_GET_ENCRYPTION_KEY_STATUS to get the real status.

What about having "fscrypt lock" call FS_IOC_GET_ENCRYPTION_KEY_STATUS
and print a warning message saying, "we can't lock it because N other
users who have registered a key".  I'd argue fscrypt should do this
regardless of whether or not FS_IOC_REMOVE_ENCRYPTION_KEY returns
EUSERS or not.

> Also, we already have the EBUSY case.  This means that the ioctl removed the
> master key secret itself; however, some files were still in-use, so the key
> remains in the "incompletely removed" state.  If we were actually going for
> unlink() semantics, then for consistency this case really ought to return 0 and
> unlink the key object, and people who care about in-use files would need to use
> FS_IOC_GET_ENCRYPTION_KEY_STATUS.  But most people *will* care about this, and
> may even want to retry the ioctl later, which isn't something youh can do with
> pure unlink() semantics.

It seems to me that the EBUSY and EUSERS errors should be status bits
which gets returned to the user in a bitfield --- and if the key has
been removed, or the user's claim on the key's existence has been
removed, the ioctl returns success.

That way we don't have to deal with the semantic disconnect where some
errors don't actually change system state, and other errors that *do*
change system state (as in, the key gets removed, or the user's claim
on the key gets removed), but still returns than error.

We could also add a flag which indicates where if there are files that
are still busy, or there are other users keeping a key in use, the
ioctl fails hard and returns an error.  At least that way we keep
consistency where an error means, "nothing has changed".

	    	     	   	  	   - Ted

P.S.  BTW, one of the comments which I didn't make was the
documentation didn't adequately explain which error codes means,
"success but with a caveat", and which errors means, "we failed and
didn't do anything".  But since I was arguing for changing the
behavior, I decided not to complain about the documentation.
Eric Biggers Aug. 1, 2019, 1:11 a.m. UTC | #5
On Wed, Jul 31, 2019 at 07:38:43PM -0400, Theodore Y. Ts'o wrote:
> On Wed, Jul 31, 2019 at 11:38:02AM -0700, Eric Biggers wrote:
> > 
> > This is perhaps different from what users expect from unlink().  It's well known
> > that unlink() just deletes the filename, not the file itself if it's still open
> > or has other links.  And unlink() by itself isn't meant for use cases where the
> > file absolutely must be securely erased.  But FS_IOC_REMOVE_ENCRYPTION_KEY
> > really is meant primarily for that sort of thing.
> 
> Seems to me that part of the confusion is FS_IOC_REMOVE_ENCRYPTION_KEY
> does two things.  One is "remove the user's handle on the key".  The
> other is "purge all keys" (which requires root).  So it does two
> different things with one ioctl.
> 

Well, it's either

1a. Remove the user's handle.
	OR 
1b. Remove all users' handles.  (FSCRYPT_REMOVE_KEY_FLAG_ALL_USERS)

Then

2. If no handles remain, try to evict all inodes that use the key.

By "purge all keys" do you mean step (2)?  Note that it doesn't require root by
itself; root is only required to remove other users' handles (1b).

It could be argued that (2) should be a separate ioctl, so we'd have UNLINK_KEY
then LOCK_KEY.  But is there a real use case for this split?  I.e. would anyone
ever want to UNLINK_KEY without also LOCK_KEY?  Is that really something we
want/need to support?  I'd really like the API to be as straightforward as
possible for the normal use case of locking a directory, and not require some
series of multiple ioctl's, which would be more difficult to use correctly.

> > To give a concrete example: my patch for the userspace tool
> > https://github.com/google/fscrypt adds a command 'fscrypt lock' which locks an
> > encrypted directory.  If, say, someone runs 'fscrypt unlock' as uid 0 and then
> > 'fscrypt lock' as uid 1000, then FS_IOC_REMOVE_ENCRYPTION_KEY can't actually
> > remove the key.  I need to make the tool show a proper error message in this
> > case.  To do so, it would help to get a unique error code (e.g. EUSERS) from
> > FS_IOC_REMOVE_ENCRYPTION_KEY, rather than get the ambiguous error code ENOKEY
> > and have to call FS_IOC_GET_ENCRYPTION_KEY_STATUS to get the real status.
> 
> What about having "fscrypt lock" call FS_IOC_GET_ENCRYPTION_KEY_STATUS
> and print a warning message saying, "we can't lock it because N other
> users who have registered a key".  I'd argue fscrypt should do this
> regardless of whether or not FS_IOC_REMOVE_ENCRYPTION_KEY returns
> EUSERS or not.

Shouldn't "fscrypt lock" still remove the user's handle, as opposed to refuse to
do anything, though?  So it would still need to call
FS_IOC_REMOVE_ENCRYPTION_KEY, and could get the status from it rather than also
needing to call FS_IOC_GET_ENCRYPTION_KEY_STATUS.

Though, FS_IOC_GET_ENCRYPTION_KEY_STATUS would be needed if we wanted to show
the specific count of other users.

> 
> > Also, we already have the EBUSY case.  This means that the ioctl removed the
> > master key secret itself; however, some files were still in-use, so the key
> > remains in the "incompletely removed" state.  If we were actually going for
> > unlink() semantics, then for consistency this case really ought to return 0 and
> > unlink the key object, and people who care about in-use files would need to use
> > FS_IOC_GET_ENCRYPTION_KEY_STATUS.  But most people *will* care about this, and
> > may even want to retry the ioctl later, which isn't something youh can do with
> > pure unlink() semantics.
> 
> It seems to me that the EBUSY and EUSERS errors should be status bits
> which gets returned to the user in a bitfield --- and if the key has
> been removed, or the user's claim on the key's existence has been
> removed, the ioctl returns success.
> 
> That way we don't have to deal with the semantic disconnect where some
> errors don't actually change system state, and other errors that *do*
> change system state (as in, the key gets removed, or the user's claim
> on the key gets removed), but still returns than error.
> 
> We could also add a flag which indicates where if there are files that
> are still busy, or there are other users keeping a key in use, the
> ioctl fails hard and returns an error.  At least that way we keep
> consistency where an error means, "nothing has changed".
> 
> 	    	     	   	  	   - Ted

Do you mean use a positive return value, or do you mean add an output field to
the struct passed to the ioctl?

The latter might be more error-prone, since it invites bugs where a directory
silently fails to be locked, because the second field was not checked.

Either way note that it doesn't really need to be a bitfield, since you can't
have both statuses at the same time.  I.e. if there are still other users, we
couldn't have even gotten to checking for in-use files.

> 
> P.S.  BTW, one of the comments which I didn't make was the
> documentation didn't adequately explain which error codes means,
> "success but with a caveat", and which errors means, "we failed and
> didn't do anything".  But since I was arguing for changing the
> behavior, I decided not to complain about the documentation.
> 

Yes, in any case the FS_IOC_REMOVE_ENCRYPTION_KEY documentation needs
improvement.  I have some updates pending for it.

- Eric
Theodore Y. Ts'o Aug. 1, 2019, 5:31 a.m. UTC | #6
On Wed, Jul 31, 2019 at 06:11:40PM -0700, Eric Biggers wrote:
> 
> Well, it's either
> 
> 1a. Remove the user's handle.
> 	OR 
> 1b. Remove all users' handles.  (FSCRYPT_REMOVE_KEY_FLAG_ALL_USERS)
> 
> Then
> 
> 2. If no handles remain, try to evict all inodes that use the key.
> 
> By "purge all keys" do you mean step (2)?  Note that it doesn't require root by
> itself; root is only required to remove other users' handles (1b).

No, I was talking about 1b.  I'd argue that 1a and 1b should be
different ioctl.  1b requires root, and 1a doesn't.

And 1a should just mean, "I don't need to use the encrypted files any
more".  In the PAM passphrase case, when you are just logging out, 1a
is what's needed, and success is just fine.  pam_session won't *care*
whether or not there are other users keeping the key in use.

The problem with "fscrypt lock" is that the non-privileged user sort
of wants to do REMOVE_FLAG_KEY_FLAG_FOR_ALL_USERS, but they doesn't
have the privileges to do it, and they are hoping that removing their
own key removes it the key from the system.  That to me seems to be
the fundamental disconnect.  The "fscrypt unlock" and "fscrypt lock"
commands comes from the v1 key management, and requires root.  It's
the translation to unprivileged mode where "fscrypt lock" seems to
have problems.

> > What about having "fscrypt lock" call FS_IOC_GET_ENCRYPTION_KEY_STATUS
> > and print a warning message saying, "we can't lock it because N other
> > users who have registered a key".  I'd argue fscrypt should do this
> > regardless of whether or not FS_IOC_REMOVE_ENCRYPTION_KEY returns
> > EUSERS or not.
> 
> Shouldn't "fscrypt lock" still remove the user's handle, as opposed to refuse to
> do anything, though?  So it would still need to callh
> FS_IOC_REMOVE_ENCRYPTION_KEY, and could get the status from it rather than also
> needing to call FS_IOC_GET_ENCRYPTION_KEY_STATUS.
> 
> Though, FS_IOC_GET_ENCRYPTION_KEY_STATUS would be needed if we wanted to show
> the specific count of other users.
 
So my perspective is that the ioctl's should have very clean
semantics, and errors should be consistent with how the Unix system
calls and error reporting.

If we need to make "fscrypt lock" and "fscrypt unlock" have semantics
that are more consistent with previous user interface choices, then
fscrypt can use FS_IOC_GET_ENCRYPTION_KEY_STATUS to print the warning
before it calls FS_IOC_REMOVE_ENCRYPTION_KEY --- with "fscrypt purge_keys"
calling something like FS_IOC_REMOVE_ALL_USER_ENCRYPTION_KEYS.

> > It seems to me that the EBUSY and EUSERS errors should be status bits
> > which gets returned to the user in a bitfield --- and if the key has
> > been removed, or the user's claim on the key's existence has been
> > removed, the ioctl returns success.
> > 
> > That way we don't have to deal with the semantic disconnect where some
> > errors don't actually change system state, and other errors that *do*
> > change system state (as in, the key gets removed, or the user's claim
> > on the key gets removed), but still returns than error.
> > 
> 
> Do you mean use a positive return value, or do you mean add an output field to
> the struct passed to the ioctl?

I meant adding an output field.  I see EBUSY and EUSERS as status bits
which *some* use cases might find useful.  Other use cases, such as in
the pam_keys session logout code, we won't care at *all* about those
status reporting (or error codes).  So if EBUSY and EUSERS are
returned as errors, then it adds to complexity of those programs
whichd don't care.  (And even for those that do, it's still a bit more
complex since they has to distinguish between EBUSY, EUSERS, or other
errors --- in fact, *all* programs which use
FS_IOC_REMOVE_ENCRYPTION_KEY will *have* to check for EBUSY and
ESUSERS whether they care or not.)

> Either way note that it doesn't really need to be a bitfield, since you can't
> have both statuses at the same time.  I.e. if there are still other users, we
> couldn't have even gotten to checking for in-use files.

That's actually an implementation detail, though, right?  In theory,
we could check to see if there are any in-use files, independently of
whether there are any users or not.

					- Ted
Eric Biggers Aug. 1, 2019, 6:35 p.m. UTC | #7
On Thu, Aug 01, 2019 at 01:31:08AM -0400, Theodore Y. Ts'o wrote:
> On Wed, Jul 31, 2019 at 06:11:40PM -0700, Eric Biggers wrote:
> > 
> > Well, it's either
> > 
> > 1a. Remove the user's handle.
> > 	OR 
> > 1b. Remove all users' handles.  (FSCRYPT_REMOVE_KEY_FLAG_ALL_USERS)
> > 
> > Then
> > 
> > 2. If no handles remain, try to evict all inodes that use the key.
> > 
> > By "purge all keys" do you mean step (2)?  Note that it doesn't require root by
> > itself; root is only required to remove other users' handles (1b).
> 
> No, I was talking about 1b.  I'd argue that 1a and 1b should be
> different ioctl.  1b requires root, and 1a doesn't.
> 
> And 1a should just mean, "I don't need to use the encrypted files any
> more".  In the PAM passphrase case, when you are just logging out, 1a
> is what's needed, and success is just fine.  pam_session won't *care*
> whether or not there are other users keeping the key in use.

But in both cases, we still need to do the same things if the last user is
removed: remove the master key secret, try to evict the inodes, and (if all
inodes could be evicted) unlink the key object from the filesystem-level
keyring.  So the ioctls would be nearly the same; it's just the
"remove key user(s)" step that would be different.

So in my view, these are variants of the same action, which is why it's a flag.
Likewise, there aren't separate ioctls for v1 and v2 policy keys, since
adding/removing those are variants on the same actions, and it allows the ioctls
to be extended to v3 in the future if it ever becomes necessary.

Now, I don't have *that* much of an issue with making it an separate ioctl
FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS.  We can make them call the same function
internally, with a bool argument to distinguish the two ioctls.  It just seems
like an unnecessary complication.

> 
> The problem with "fscrypt lock" is that the non-privileged user sort
> of wants to do REMOVE_FLAG_KEY_FLAG_FOR_ALL_USERS, but they doesn't
> have the privileges to do it, and they are hoping that removing their
> own key removes it the key from the system.  That to me seems to be
> the fundamental disconnect.  The "fscrypt unlock" and "fscrypt lock"
> commands comes from the v1 key management, and requires root.  It's
> the translation to unprivileged mode where "fscrypt lock" seems to
> have problems.

"fscrypt lock" actually doesn't exist yet; it's a missing feature.  My patch to
the fscrypt tool adds it.  So we get to decide on the semantics.  We don't want
to require root, though; so for v2 policy keys, the real semantics have to be
that "fscrypt lock" registers the key for the user, and "fscrypt unlock"
unregisters it for the user.

One could argue that because of this they should be named "fscrypt register_key"
and "fscrypt unregister_key".  However, in the vast majority of cases these will
be run as a single user, with a key that is not shared with any other user.

[In fact, I recently changed the fscrypt tool to automatically set mode 0700 on
new encrypted directories, since most people found it surprising that their
unlocked encrypted files weren't private to them by default.  So if someone
wants to share their encrypted directory with another user, they now also need
to explicitly chmod it, unless the root user is involved.]

So presenting the commands as locking/unlocking a directory is a useful
simplication that makes them much easier to use, IMO.  People shouldn't need to
understand multi-user key registration in order to unlock/lock their personal
encrypted directories.

And if "fscrypt lock" does nevertheless hit the case where other users remain, I
think it would be most user-friendly to print a warning like this:

	Directory not fully locked; other users have added the key.
 	Run 'sudo fscrypt lock --all-users DIR' if you want to force-lock the directory.

We *could* make the --all-users option a separate subcommand like
"fscrypt force_lock", but again to me it seems like a variant of the same thing.

> > > It seems to me that the EBUSY and EUSERS errors should be status bits
> > > which gets returned to the user in a bitfield --- and if the key has
> > > been removed, or the user's claim on the key's existence has been
> > > removed, the ioctl returns success.
> > > 
> > > That way we don't have to deal with the semantic disconnect where some
> > > errors don't actually change system state, and other errors that *do*
> > > change system state (as in, the key gets removed, or the user's claim
> > > on the key gets removed), but still returns than error.
> > > 
> > 
> > Do you mean use a positive return value, or do you mean add an output field to
> > the struct passed to the ioctl?
> 
> I meant adding an output field.  I see EBUSY and EUSERS as status bits
> which *some* use cases might find useful.  Other use cases, such as in
> the pam_keys session logout code, we won't care at *all* about those
> status reporting (or error codes).  So if EBUSY and EUSERS are
> returned as errors, then it adds to complexity of those programs
> whichd don't care.  (And even for those that do, it's still a bit more
> complex since they has to distinguish between EBUSY, EUSERS, or other
> errors --- in fact, *all* programs which use
> FS_IOC_REMOVE_ENCRYPTION_KEY will *have* to check for EBUSY and
> ESUSERS whether they care or not.)
> 

I see it a little differently: I'd prefer for the API to be "secure by default",
i.e. it tries hard to really remove the key, and it returns an error if it
couldn't really be removed (but still did as much as possible).  And if the
caller is fine with some case(s) where the key wasn't truly removed, then they
can just explicitly handle those case(s).

You're suggesting the opposite: the ioctl will return 0 if the key was
unregistered for current user only, or if some files are still in use; and if
someone cares whether the key was *really* removed, then they'd need to check
the additional status bits.

That's easier to misuse in the more important ways, in my view.  Now, it's not a
huge deal, as the API provides the same information either way, and regardless
of which one we choose, I'll make sure it's used correctly in fscrypt, Android,
Chrome OS, etc...

> > Either way note that it doesn't really need to be a bitfield, since you can't
> > have both statuses at the same time.  I.e. if there are still other users, we
> > couldn't have even gotten to checking for in-use files.
> 
> That's actually an implementation detail, though, right?  In theory,
> we could check to see if there are any in-use files, independently of
> whether there are any users or not.
> 

Yes, but wouldn't people assume that if the bitfield is provided, that all the
bits are actually filled in?  Remember that to determine the "in-use files" bit
we have to actually go through the inode list and try to evict all the inodes.
That's not really something we should be doing before the last user is removed.

- Eric
Eric Biggers Aug. 1, 2019, 6:46 p.m. UTC | #8
On Thu, Aug 01, 2019 at 11:35:56AM -0700, Eric Biggers wrote:
> 
> "fscrypt lock" actually doesn't exist yet; it's a missing feature.  My patch to
> the fscrypt tool adds it.  So we get to decide on the semantics.  We don't want
> to require root, though; so for v2 policy keys, the real semantics have to be
> that "fscrypt lock" registers the key for the user, and "fscrypt unlock"
> unregisters it for the user.
> 

I meant the other way around, of course: "fscrypt unlock" registers the key for
the user, and "fscrypt lock" unregisters it for the user.

- Eric
Eric Biggers Aug. 1, 2019, 10:04 p.m. UTC | #9
On Thu, Aug 01, 2019 at 01:31:08AM -0400, Theodore Y. Ts'o wrote:
> On Wed, Jul 31, 2019 at 06:11:40PM -0700, Eric Biggers wrote:
> > 
> > Well, it's either
> > 
> > 1a. Remove the user's handle.
> > 	OR 
> > 1b. Remove all users' handles.  (FSCRYPT_REMOVE_KEY_FLAG_ALL_USERS)
> > 
> > Then
> > 
> > 2. If no handles remain, try to evict all inodes that use the key.
> > 
> > By "purge all keys" do you mean step (2)?  Note that it doesn't require root by
> > itself; root is only required to remove other users' handles (1b).
> 
> No, I was talking about 1b.  I'd argue that 1a and 1b should be
> different ioctl.  1b requires root, and 1a doesn't.
> 
[...]
> > 
> > Do you mean use a positive return value, or do you mean add an output field to
> > the struct passed to the ioctl?
> 
> I meant adding an output field.  I see EBUSY and EUSERS as status bits
> which *some* use cases might find useful.

Ted, would you be happy with the following API?

Removing keys
-------------

Two ioctls are available for removing a key that was added by
FS_IOC_ADD_ENCRYPTION_KEY: FS_IOC_REMOVE_ENCRYPTION_KEY and
FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS.  They differ only in cases
where v2 policy keys are added or removed by non-root users.

These ioctls don't work on keys that were added via the legacy
process-subscribed keyrings mechanism.

Before using these ioctls, read the `Kernel memory compromise`_
section for a discussion of the security goals and limitations of
these ioctls.

FS_IOC_REMOVE_ENCRYPTION_KEY
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The FS_IOC_REMOVE_ENCRYPTION_KEY ioctl removes a claim to an fscrypt
master encryption key from the filesystem, and possibly removes the
key itself.  It can be executed on any file or directory on the target
filesystem, but using the filesystem's root directory is recommended.
It takes in a pointer to a :c:type:`struct fscrypt_remove_key_arg`,
defined as follows::

    struct fscrypt_remove_key_arg {
            struct fscrypt_key_specifier key_spec;
    #define FSCRYPT_KEY_REMOVAL_STATUS_FLAG_OTHER_USERS     0x00000001
    #define FSCRYPT_KEY_REMOVAL_STATUS_FLAG_FILES_BUSY      0x00000002
            __u32 removal_status_flags;     /* output */
            __u32 __reserved[5];
    };

This structure must be zeroed, then initialized as follows:

- The key to remove is specified by ``key_spec``:

    - To remove a key used by v1 encryption policies, set
      ``key_spec.type`` to FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR and fill
      in ``key_spec.u.descriptor``.  To remove this type of key, the
      calling process must have the CAP_SYS_ADMIN capability in the
      initial user namespace.

    - To remove a key used by v2 encryption policies, set
      ``key_spec.type`` to FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER and fill
      in ``key_spec.u.identifier``.  To remove this type of key, no
      privileges are needed.  However, users can only remove keys that
      they added themselves, subject to privileged override with
      FSCRYPT_REMOVE_KEY_FLAG_ALL_USERS.

For v2 policy keys, this ioctl is usable by non-root users.  However,
to make this possible, it actually just removes the current user's
claim to the key, undoing a single call to FS_IOC_ADD_ENCRYPTION_KEY.
Only after all claims are removed is the key really removed.

For example, if FS_IOC_ADD_ENCRYPTION_KEY was called with uid 1000,
then the key will be "claimed" by uid 1000, and
FS_IOC_REMOVE_ENCRYPTION_KEY will only succeed as uid 1000.  Or, if
both uids 1000 and 2000 added the key, then for each uid
FS_IOC_REMOVE_ENCRYPTION_KEY will only remove their own claim.  Only
once *both* are removed is the key really removed.  (Think of it like
unlinking a file that may have hard links.)

If FS_IOC_REMOVE_ENCRYPTION_KEY really removes the key, it will also
try to "lock" all files that had been unlocked with the key.  It won't
lock files that are still in-use.  If necessary, the ioctl can be
executed again later to retry locking any remaining files.

FS_IOC_REMOVE_ENCRYPTION_KEY returns 0 if either the key was removed
(but may still have files remaining to be locked), the user's claim to
the key was removed, or the key was already removed but had files
remaining to be the locked so the ioctl retried locking them.  In any
of these cases, ``removal_status_flags`` is filled in with the
following informational status flags:

- ``FSCRYPT_KEY_REMOVAL_STATUS_FLAG_OTHER_USERS``: set if only the
  user's claim to the key was removed, not the key itself
- ``FSCRYPT_KEY_REMOVAL_STATUS_FLAG_FILES_BUSY``: set if some file(s)
  are still in-use.  Not guaranteed to be set in the case where only
  the user's claim to the key was removed.

FS_IOC_REMOVE_ENCRYPTION_KEY can fail with the following errors:

- ``EACCES``: The FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR key specifier type
  was specified, but the caller does not have the CAP_SYS_ADMIN
  capability in the initial user namespace
- ``EINVAL``: invalid flags or key specifier type, or reserved bits
  were set
- ``ENOKEY``: the key object was not found at all, i.e. it was never
  added in the first place or was already fully removed including all
  files locked; or, the user does not have a claim to the key.
- ``ENOTTY``: this type of filesystem does not implement encryption
- ``EOPNOTSUPP``: the kernel was not configured with encryption
  support for this filesystem, or the filesystem superblock has not
  had encryption enabled on it

FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS is exactly the same as
FS_IOC_REMOVE_ENCRYPTION_KEY, except that for v2 policy keys, the
ALL_USERS version of the ioctl will remove all users' claims to the
key, not just the current user's.  I.e., the key itself will always be
removed, no matter how many users have added it.  This difference is
only meaningful if non-root users are adding and removing keys.

Because of this, FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS also requires
"root", namely the CAP_SYS_ADMIN capability in the initial user
namespace.  Otherwise it will fail with ``EACCES``.
Eric Biggers Aug. 2, 2019, 4:38 a.m. UTC | #10
On Thu, Aug 01, 2019 at 03:04:34PM -0700, Eric Biggers wrote:
> On Thu, Aug 01, 2019 at 01:31:08AM -0400, Theodore Y. Ts'o wrote:
> > On Wed, Jul 31, 2019 at 06:11:40PM -0700, Eric Biggers wrote:
> > > 
> > > Well, it's either
> > > 
> > > 1a. Remove the user's handle.
> > > 	OR 
> > > 1b. Remove all users' handles.  (FSCRYPT_REMOVE_KEY_FLAG_ALL_USERS)
> > > 
> > > Then
> > > 
> > > 2. If no handles remain, try to evict all inodes that use the key.
> > > 
> > > By "purge all keys" do you mean step (2)?  Note that it doesn't require root by
> > > itself; root is only required to remove other users' handles (1b).
> > 
> > No, I was talking about 1b.  I'd argue that 1a and 1b should be
> > different ioctl.  1b requires root, and 1a doesn't.
> > 
> [...]
> > > 
> > > Do you mean use a positive return value, or do you mean add an output field to
> > > the struct passed to the ioctl?
> > 
> > I meant adding an output field.  I see EBUSY and EUSERS as status bits
> > which *some* use cases might find useful.
> 
> Ted, would you be happy with the following API?
> 

Here's a slightly updated version (I missed removing some stale text):

Removing keys
-------------

Two ioctls are available for removing a key that was added by
`FS_IOC_ADD_ENCRYPTION_KEY`_:

- `FS_IOC_REMOVE_ENCRYPTION_KEY`_
- `FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS`_

These two ioctls differ only in cases where v2 policy keys are added
or removed by non-root users.

These ioctls don't work on keys that were added via the legacy
process-subscribed keyrings mechanism.

Before using these ioctls, read the `Kernel memory compromise`_
section for a discussion of the security goals and limitations of
these ioctls.

FS_IOC_REMOVE_ENCRYPTION_KEY
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The FS_IOC_REMOVE_ENCRYPTION_KEY ioctl removes a claim to a master
encryption key from the filesystem, and possibly removes the key
itself.  It can be executed on any file or directory on the target
filesystem, but using the filesystem's root directory is recommended.
It takes in a pointer to a :c:type:`struct fscrypt_remove_key_arg`,
defined as follows::

    struct fscrypt_remove_key_arg {
            struct fscrypt_key_specifier key_spec;
    #define FSCRYPT_KEY_REMOVAL_STATUS_FLAG_FILES_BUSY      0x00000001
    #define FSCRYPT_KEY_REMOVAL_STATUS_FLAG_OTHER_USERS     0x00000002
            __u32 removal_status_flags;     /* output */
            __u32 __reserved[5];
    };

This structure must be zeroed, then initialized as follows:

- The key to remove is specified by ``key_spec``:

    - To remove a key used by v1 encryption policies, set
      ``key_spec.type`` to FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR and fill
      in ``key_spec.u.descriptor``.  To remove this type of key, the
      calling process must have the CAP_SYS_ADMIN capability in the
      initial user namespace.

    - To remove a key used by v2 encryption policies, set
      ``key_spec.type`` to FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER and fill
      in ``key_spec.u.identifier``.

For v2 policy keys, this ioctl is usable by non-root users.  However,
to make this possible, it actually just removes the current user's
claim to the key, undoing a single call to FS_IOC_ADD_ENCRYPTION_KEY.
Only after all claims are removed is the key really removed.

For example, if FS_IOC_ADD_ENCRYPTION_KEY was called with uid 1000,
then the key will be "claimed" by uid 1000, and
FS_IOC_REMOVE_ENCRYPTION_KEY will only succeed as uid 1000.  Or, if
both uids 1000 and 2000 added the key, then for each uid
FS_IOC_REMOVE_ENCRYPTION_KEY will only remove their own claim.  Only
once *both* are removed is the key really removed.  (Think of it like
unlinking a file that may have hard links.)

If FS_IOC_REMOVE_ENCRYPTION_KEY really removes the key, it will also
try to "lock" all files that had been unlocked with the key.  It won't
lock files that are still in-use, so this ioctl is expected to be used
in cooperation with userspace ensuring that none of the files are
still open.  However, if necessary, the ioctl can be executed again
later to retry locking any remaining files.

FS_IOC_REMOVE_ENCRYPTION_KEY returns 0 if either the key was removed
(but may still have files remaining to be locked), the user's claim to
the key was removed, or the key was already removed but had files
remaining to be the locked so the ioctl retried locking them.  In any
of these cases, ``removal_status_flags`` is filled in with the
following informational status flags:

- ``FSCRYPT_KEY_REMOVAL_STATUS_FLAG_FILES_BUSY``: set if some file(s)
  are still in-use.  Not guaranteed to be set in the case where only
  the user's claim to the key was removed.
- ``FSCRYPT_KEY_REMOVAL_STATUS_FLAG_OTHER_USERS``: set if only the
  user's claim to the key was removed, not the key itself

FS_IOC_REMOVE_ENCRYPTION_KEY can fail with the following errors:

- ``EACCES``: The FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR key specifier type
  was specified, but the caller does not have the CAP_SYS_ADMIN
  capability in the initial user namespace
- ``EINVAL``: invalid key specifier type, or reserved bits were set
- ``ENOKEY``: the key object was not found at all, i.e. it was never
  added in the first place or was already fully removed including all
  files locked; or, the user does not have a claim to the key.
- ``ENOTTY``: this type of filesystem does not implement encryption
- ``EOPNOTSUPP``: the kernel was not configured with encryption
  support for this filesystem, or the filesystem superblock has not
  had encryption enabled on it

FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS is exactly the same as
`FS_IOC_REMOVE_ENCRYPTION_KEY`_, except that for v2 policy keys, the
ALL_USERS version of the ioctl will remove all users' claims to the
key, not just the current user's.  I.e., the key itself will always be
removed, no matter how many users have added it.  This difference is
only meaningful if non-root users are adding and removing keys.

Because of this, FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS also requires
"root", namely the CAP_SYS_ADMIN capability in the initial user
namespace.  Otherwise it will fail with ``EACCES``.
Theodore Y. Ts'o Aug. 12, 2019, 2:16 p.m. UTC | #11
On Thu, Aug 01, 2019 at 09:38:27PM -0700, Eric Biggers wrote:
> 
> Here's a slightly updated version (I missed removing some stale text):

Apologies for the delaying in getting back.  Thanks, this looks great.

	      	  	      	      	     - Ted

> 
> Removing keys
> -------------
> 
> Two ioctls are available for removing a key that was added by
> `FS_IOC_ADD_ENCRYPTION_KEY`_:
> 
> - `FS_IOC_REMOVE_ENCRYPTION_KEY`_
> - `FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS`_
> 
> These two ioctls differ only in cases where v2 policy keys are added
> or removed by non-root users.
> 
> These ioctls don't work on keys that were added via the legacy
> process-subscribed keyrings mechanism.
> 
> Before using these ioctls, read the `Kernel memory compromise`_
> section for a discussion of the security goals and limitations of
> these ioctls.
> 
> FS_IOC_REMOVE_ENCRYPTION_KEY
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The FS_IOC_REMOVE_ENCRYPTION_KEY ioctl removes a claim to a master
> encryption key from the filesystem, and possibly removes the key
> itself.  It can be executed on any file or directory on the target
> filesystem, but using the filesystem's root directory is recommended.
> It takes in a pointer to a :c:type:`struct fscrypt_remove_key_arg`,
> defined as follows::
> 
>     struct fscrypt_remove_key_arg {
>             struct fscrypt_key_specifier key_spec;
>     #define FSCRYPT_KEY_REMOVAL_STATUS_FLAG_FILES_BUSY      0x00000001
>     #define FSCRYPT_KEY_REMOVAL_STATUS_FLAG_OTHER_USERS     0x00000002
>             __u32 removal_status_flags;     /* output */
>             __u32 __reserved[5];
>     };
> 
> This structure must be zeroed, then initialized as follows:
> 
> - The key to remove is specified by ``key_spec``:
> 
>     - To remove a key used by v1 encryption policies, set
>       ``key_spec.type`` to FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR and fill
>       in ``key_spec.u.descriptor``.  To remove this type of key, the
>       calling process must have the CAP_SYS_ADMIN capability in the
>       initial user namespace.
> 
>     - To remove a key used by v2 encryption policies, set
>       ``key_spec.type`` to FSCRYPT_KEY_SPEC_TYPE_IDENTIFIER and fill
>       in ``key_spec.u.identifier``.
> 
> For v2 policy keys, this ioctl is usable by non-root users.  However,
> to make this possible, it actually just removes the current user's
> claim to the key, undoing a single call to FS_IOC_ADD_ENCRYPTION_KEY.
> Only after all claims are removed is the key really removed.
> 
> For example, if FS_IOC_ADD_ENCRYPTION_KEY was called with uid 1000,
> then the key will be "claimed" by uid 1000, and
> FS_IOC_REMOVE_ENCRYPTION_KEY will only succeed as uid 1000.  Or, if
> both uids 1000 and 2000 added the key, then for each uid
> FS_IOC_REMOVE_ENCRYPTION_KEY will only remove their own claim.  Only
> once *both* are removed is the key really removed.  (Think of it like
> unlinking a file that may have hard links.)
> 
> If FS_IOC_REMOVE_ENCRYPTION_KEY really removes the key, it will also
> try to "lock" all files that had been unlocked with the key.  It won't
> lock files that are still in-use, so this ioctl is expected to be used
> in cooperation with userspace ensuring that none of the files are
> still open.  However, if necessary, the ioctl can be executed again
> later to retry locking any remaining files.
> 
> FS_IOC_REMOVE_ENCRYPTION_KEY returns 0 if either the key was removed
> (but may still have files remaining to be locked), the user's claim to
> the key was removed, or the key was already removed but had files
> remaining to be the locked so the ioctl retried locking them.  In any
> of these cases, ``removal_status_flags`` is filled in with the
> following informational status flags:
> 
> - ``FSCRYPT_KEY_REMOVAL_STATUS_FLAG_FILES_BUSY``: set if some file(s)
>   are still in-use.  Not guaranteed to be set in the case where only
>   the user's claim to the key was removed.
> - ``FSCRYPT_KEY_REMOVAL_STATUS_FLAG_OTHER_USERS``: set if only the
>   user's claim to the key was removed, not the key itself
> 
> FS_IOC_REMOVE_ENCRYPTION_KEY can fail with the following errors:
> 
> - ``EACCES``: The FSCRYPT_KEY_SPEC_TYPE_DESCRIPTOR key specifier type
>   was specified, but the caller does not have the CAP_SYS_ADMIN
>   capability in the initial user namespace
> - ``EINVAL``: invalid key specifier type, or reserved bits were set
> - ``ENOKEY``: the key object was not found at all, i.e. it was never
>   added in the first place or was already fully removed including all
>   files locked; or, the user does not have a claim to the key.
> - ``ENOTTY``: this type of filesystem does not implement encryption
> - ``EOPNOTSUPP``: the kernel was not configured with encryption
>   support for this filesystem, or the filesystem superblock has not
>   had encryption enabled on it
> 
> FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS is exactly the same as
> `FS_IOC_REMOVE_ENCRYPTION_KEY`_, except that for v2 policy keys, the
> ALL_USERS version of the ioctl will remove all users' claims to the
> key, not just the current user's.  I.e., the key itself will always be
> removed, no matter how many users have added it.  This difference is
> only meaningful if non-root users are adding and removing keys.
> 
> Because of this, FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS also requires
> "root", namely the CAP_SYS_ADMIN capability in the initial user
> namespace.  Otherwise it will fail with ``EACCES``.

Patch
diff mbox series

diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
index 86153a0044ba7..3616232b4798e 100644
--- a/fs/crypto/fscrypt_private.h
+++ b/fs/crypto/fscrypt_private.h
@@ -78,6 +78,19 @@  struct fscrypt_info {
 	/* Back-pointer to the inode */
 	struct inode *ci_inode;
 
+	/*
+	 * The master key with which this inode was unlocked (decrypted).  This
+	 * will be NULL if the master key was found in a process-subscribed
+	 * keyring rather than in the filesystem-level keyring.
+	 */
+	struct key *ci_master_key;
+
+	/*
+	 * Link in list of inodes that were unlocked with the master key.
+	 * Only used when ->ci_master_key is set.
+	 */
+	struct list_head ci_master_key_link;
+
 	/*
 	 * If non-NULL, then encryption is done using the master key directly
 	 * and ci_ctfm will equal ci_direct_key->dk_ctfm.
@@ -183,14 +196,52 @@  struct fscrypt_master_key_secret {
  */
 struct fscrypt_master_key {
 
-	/* The secret key material */
+	/*
+	 * The secret key material.  After FS_IOC_REMOVE_ENCRYPTION_KEY is
+	 * executed, this is wiped and no new inodes can be unlocked with this
+	 * key; however, there may still be inodes in ->mk_decrypted_inodes
+	 * which could not be evicted.  As long as some inodes still remain,
+	 * FS_IOC_REMOVE_ENCRYPTION_KEY can be retried, or
+	 * FS_IOC_ADD_ENCRYPTION_KEY can add the secret again.
+	 *
+	 * Locking: protected by key->sem.
+	 */
 	struct fscrypt_master_key_secret	mk_secret;
 
 	/* Arbitrary key descriptor which was assigned by userspace */
 	struct fscrypt_key_specifier		mk_spec;
 
+	/*
+	 * Length of ->mk_decrypted_inodes, plus one if mk_secret is present.
+	 * Once this goes to 0, the master key is removed from ->s_master_keys.
+	 * The 'struct fscrypt_master_key' will continue to live as long as the
+	 * 'struct key' whose payload it is, but we won't let this reference
+	 * count rise again.
+	 */
+	refcount_t		mk_refcount;
+
+	/*
+	 * List of inodes that were unlocked using this key.  This allows the
+	 * inodes to be evicted efficiently if the key is removed.
+	 */
+	struct list_head	mk_decrypted_inodes;
+	spinlock_t		mk_decrypted_inodes_lock;
+
 } __randomize_layout;
 
+static inline bool
+is_master_key_secret_present(const struct fscrypt_master_key_secret *secret)
+{
+	/*
+	 * The READ_ONCE() is only necessary for fscrypt_drop_inode() and
+	 * fscrypt_key_describe().  These run in atomic context, so they can't
+	 * take key->sem and thus 'secret' can change concurrently which would
+	 * be a data race.  But they only need to know whether the secret *was*
+	 * present at the time of check, so READ_ONCE() suffices.
+	 */
+	return READ_ONCE(secret->size) != 0;
+}
+
 extern struct key *
 fscrypt_find_master_key(struct super_block *sb,
 			const struct fscrypt_key_specifier *mk_spec);
diff --git a/fs/crypto/keyring.c b/fs/crypto/keyring.c
index bd489433bba04..ce33c38955233 100644
--- a/fs/crypto/keyring.c
+++ b/fs/crypto/keyring.c
@@ -10,6 +10,7 @@ 
  * filesystem-level keyring, including the ioctls:
  *
  * - FS_IOC_ADD_ENCRYPTION_KEY: add a key
+ * - FS_IOC_REMOVE_ENCRYPTION_KEY: remove a key
  */
 
 #include <linux/key-type.h>
@@ -66,6 +67,13 @@  static void fscrypt_key_destroy(struct key *key)
 static void fscrypt_key_describe(const struct key *key, struct seq_file *m)
 {
 	seq_puts(m, key->description);
+
+	if (key_is_positive(key)) {
+		const struct fscrypt_master_key *mk = key->payload.data[0];
+
+		if (!is_master_key_secret_present(&mk->mk_secret))
+			seq_puts(m, ": secret removed");
+	}
 }
 
 /*
@@ -192,6 +200,10 @@  static int add_new_master_key(struct fscrypt_master_key_secret *secret,
 
 	move_master_key_secret(&mk->mk_secret, secret);
 
+	refcount_set(&mk->mk_refcount, 1); /* secret is present */
+	INIT_LIST_HEAD(&mk->mk_decrypted_inodes);
+	spin_lock_init(&mk->mk_decrypted_inodes_lock);
+
 	format_mk_description(description, mk_spec);
 	key = key_alloc(&key_type_fscrypt, description,
 			GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, current_cred(),
@@ -213,6 +225,22 @@  static int add_new_master_key(struct fscrypt_master_key_secret *secret,
 	return err;
 }
 
+#define KEY_DEAD	1
+
+static int add_existing_master_key(struct fscrypt_master_key *mk,
+				   struct fscrypt_master_key_secret *secret,
+				   const struct fscrypt_key_specifier *mk_spec)
+{
+	if (is_master_key_secret_present(&mk->mk_secret))
+		return 0;
+
+	if (!refcount_inc_not_zero(&mk->mk_refcount))
+		return KEY_DEAD;
+
+	move_master_key_secret(&mk->mk_secret, secret);
+	return 0;
+}
+
 static int add_master_key(struct super_block *sb,
 			  struct fscrypt_master_key_secret *secret,
 			  const struct fscrypt_key_specifier *mk_spec)
@@ -222,6 +250,7 @@  static int add_master_key(struct super_block *sb,
 	int err;
 
 	mutex_lock(&fscrypt_add_key_mutex); /* serialize find + link */
+retry:
 	key = fscrypt_find_master_key(sb, mk_spec);
 	if (IS_ERR(key)) {
 		err = PTR_ERR(key);
@@ -233,8 +262,21 @@  static int add_master_key(struct super_block *sb,
 			goto out_unlock;
 		err = add_new_master_key(secret, mk_spec, sb->s_master_keys);
 	} else {
+		/*
+		 * Found the key in ->s_master_keys.  Re-add the secret if
+		 * needed.
+		 */
+		down_write(&key->sem);
+		err = add_existing_master_key(key->payload.data[0], secret,
+					      mk_spec);
+		up_write(&key->sem);
+		if (err == KEY_DEAD) {
+			/* Key being removed or needs to be removed */
+			key_invalidate(key);
+			key_put(key);
+			goto retry;
+		}
 		key_put(key);
-		err = 0;
 	}
 out_unlock:
 	mutex_unlock(&fscrypt_add_key_mutex);
@@ -283,6 +325,209 @@  int fscrypt_ioctl_add_key(struct file *filp, void __user *_uarg)
 }
 EXPORT_SYMBOL_GPL(fscrypt_ioctl_add_key);
 
+static void shrink_dcache_inode(struct inode *inode)
+{
+	struct dentry *dentry;
+
+	if (S_ISDIR(inode->i_mode)) {
+		dentry = d_find_any_alias(inode);
+		if (dentry) {
+			shrink_dcache_parent(dentry);
+			dput(dentry);
+		}
+	}
+	d_prune_aliases(inode);
+}
+
+static void evict_dentries_for_decrypted_inodes(struct fscrypt_master_key *mk)
+{
+	struct fscrypt_info *ci;
+	struct inode *inode;
+	struct inode *toput_inode = NULL;
+
+	spin_lock(&mk->mk_decrypted_inodes_lock);
+
+	list_for_each_entry(ci, &mk->mk_decrypted_inodes, ci_master_key_link) {
+		inode = ci->ci_inode;
+		spin_lock(&inode->i_lock);
+		if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+		__iget(inode);
+		spin_unlock(&inode->i_lock);
+		spin_unlock(&mk->mk_decrypted_inodes_lock);
+
+		shrink_dcache_inode(inode);
+		iput(toput_inode);
+		toput_inode = inode;
+
+		spin_lock(&mk->mk_decrypted_inodes_lock);
+	}
+
+	spin_unlock(&mk->mk_decrypted_inodes_lock);
+	iput(toput_inode);
+}
+
+static int check_for_busy_inodes(struct super_block *sb,
+				 struct fscrypt_master_key *mk)
+{
+	struct list_head *pos;
+	size_t busy_count = 0;
+	unsigned long ino;
+	struct dentry *dentry;
+	char _path[256];
+	char *path = NULL;
+
+	spin_lock(&mk->mk_decrypted_inodes_lock);
+
+	list_for_each(pos, &mk->mk_decrypted_inodes)
+		busy_count++;
+
+	if (busy_count == 0) {
+		spin_unlock(&mk->mk_decrypted_inodes_lock);
+		return 0;
+	}
+
+	{
+		/* select an example file to show for debugging purposes */
+		struct inode *inode =
+			list_first_entry(&mk->mk_decrypted_inodes,
+					 struct fscrypt_info,
+					 ci_master_key_link)->ci_inode;
+		ino = inode->i_ino;
+		dentry = d_find_alias(inode);
+	}
+	spin_unlock(&mk->mk_decrypted_inodes_lock);
+
+	if (dentry) {
+		path = dentry_path(dentry, _path, sizeof(_path));
+		dput(dentry);
+	}
+	if (IS_ERR_OR_NULL(path))
+		path = "(unknown)";
+
+	fscrypt_warn(NULL,
+		     "%s: %zu inodes still busy after removing key with description %*phN, including ino %lu (%s)",
+		     sb->s_id, busy_count, master_key_spec_len(&mk->mk_spec),
+		     (u8 *)&mk->mk_spec.u, ino, path);
+	return -EBUSY;
+}
+
+static int try_to_lock_encrypted_files(struct super_block *sb,
+				       struct fscrypt_master_key *mk)
+{
+	int err1;
+	int err2;
+
+	/*
+	 * An inode can't be evicted while it is dirty or has dirty pages.
+	 * Thus, we first have to clean the inodes in ->mk_decrypted_inodes.
+	 *
+	 * Just do it the easy way: call sync_filesystem().  It's overkill, but
+	 * it works, and it's more important to minimize the amount of caches we
+	 * drop than the amount of data we sync.  Also, unprivileged users can
+	 * already call sync_filesystem() via sys_syncfs() or sys_sync().
+	 */
+	down_read(&sb->s_umount);
+	err1 = sync_filesystem(sb);
+	up_read(&sb->s_umount);
+	/* If a sync error occurs, still try to evict as much as possible. */
+
+	/*
+	 * Inodes are pinned by their dentries, so we have to evict their
+	 * dentries.  shrink_dcache_sb() would suffice, but would be overkill
+	 * and inappropriate for use by unprivileged users.  So instead go
+	 * through the inodes' alias lists and try to evict each dentry.
+	 */
+	evict_dentries_for_decrypted_inodes(mk);
+
+	/*
+	 * evict_dentries_for_decrypted_inodes() already iput() each inode in
+	 * the list; any inodes for which that dropped the last reference will
+	 * have been evicted due to fscrypt_drop_inode() detecting the key
+	 * removal and telling the VFS to evict the inode.  So to finish, we
+	 * just need to check whether any inodes couldn't be evicted.
+	 */
+	err2 = check_for_busy_inodes(sb, mk);
+
+	return err1 ?: err2;
+}
+
+/*
+ * Try to remove an fscrypt master encryption key.  If other users have also
+ * added the key, we'll remove the current user's usage of the key, then return
+ * -EUSERS.  Otherwise we'll continue on and try to actually remove the key.
+ *
+ * First we wipe the actual master key secret from memory, so that no more
+ * inodes can be unlocked with it.  Then, we try to evict all cached inodes that
+ * had been unlocked using the key.  Since this can fail for in-use inodes, this
+ * is expected to be used in cooperation with userspace ensuring that none of
+ * the files are still open.
+ *
+ * If, nevertheless, some inodes could not be evicted, we return -EBUSY
+ * (although we still evicted as many inodes as possible) and keep the 'struct
+ * key' and the 'struct fscrypt_master_key' around to keep track of the list of
+ * remaining inodes.  Userspace can then retry the ioctl later to retry evicting
+ * the remaining inodes, or alternatively can add the secret key again.
+ *
+ * Note that even though we wipe the encryption *keys* from memory, decrypted
+ * data can likely still be found in memory, e.g. in pagecache pages that have
+ * been freed.  Wiping such data is currently out of scope, short of users who
+ * may choose to enable page and slab poisoning systemwide.
+ */
+int fscrypt_ioctl_remove_key(struct file *filp, const void __user *uarg)
+{
+	struct super_block *sb = file_inode(filp)->i_sb;
+	struct fscrypt_remove_key_arg arg;
+	struct key *key;
+	struct fscrypt_master_key *mk;
+	int err;
+	bool dead;
+
+	if (copy_from_user(&arg, uarg, sizeof(arg)))
+		return -EFAULT;
+
+	if (!valid_key_spec(&arg.key_spec))
+		return -EINVAL;
+
+	if (memchr_inv(arg.__reserved, 0, sizeof(arg.__reserved)))
+		return -EINVAL;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	/* Find the key being removed. */
+	key = fscrypt_find_master_key(sb, &arg.key_spec);
+	if (IS_ERR(key))
+		return PTR_ERR(key);
+	mk = key->payload.data[0];
+
+	down_write(&key->sem);
+
+	/* Wipe the secret. */
+	dead = false;
+	if (is_master_key_secret_present(&mk->mk_secret)) {
+		wipe_master_key_secret(&mk->mk_secret);
+		dead = refcount_dec_and_test(&mk->mk_refcount);
+	}
+	up_write(&key->sem);
+	if (dead) {
+		/*
+		 * We wiped the secret and no inodes reference the key anymore,
+		 * so it's free to remove.
+		 */
+		key_invalidate(key);
+		err = 0;
+	} else {
+		/* Some inodes still reference this key; try to evict them. */
+		err = try_to_lock_encrypted_files(sb, mk);
+	}
+	key_put(key);
+	return err;
+}
+EXPORT_SYMBOL_GPL(fscrypt_ioctl_remove_key);
+
 int __init fscrypt_init_keyring(void)
 {
 	return register_key_type(&key_type_fscrypt);
diff --git a/fs/crypto/keysetup.c b/fs/crypto/keysetup.c
index a457f5aefde5a..6b35c550e87a4 100644
--- a/fs/crypto/keysetup.c
+++ b/fs/crypto/keysetup.c
@@ -218,8 +218,16 @@  int fscrypt_set_derived_key(struct fscrypt_info *ci, const u8 *derived_key)
 
 /*
  * Find the master key, then set up the inode's actual encryption key.
+ *
+ * If the master key is found in the filesystem-level keyring, then the
+ * corresponding 'struct key' is returned in *master_key_ret with
+ * ->sem read-locked.  This is needed to ensure that only one task links the
+ * fscrypt_info into ->mk_decrypted_inodes (as multiple tasks may race to create
+ * an fscrypt_info for the same inode), and to synchronize the master key being
+ * removed with a new inode starting to use it.
  */
-static int setup_file_encryption_key(struct fscrypt_info *ci)
+static int setup_file_encryption_key(struct fscrypt_info *ci,
+				     struct key **master_key_ret)
 {
 	struct key *key;
 	struct fscrypt_master_key *mk = NULL;
@@ -239,6 +247,13 @@  static int setup_file_encryption_key(struct fscrypt_info *ci)
 	}
 
 	mk = key->payload.data[0];
+	down_read(&key->sem);
+
+	/* Has the secret been removed (via FS_IOC_REMOVE_ENCRYPTION_KEY)? */
+	if (!is_master_key_secret_present(&mk->mk_secret)) {
+		err = -ENOKEY;
+		goto out_release_key;
+	}
 
 	if (mk->mk_secret.size < ci->ci_mode->keysize) {
 		fscrypt_warn(NULL,
@@ -250,14 +265,22 @@  static int setup_file_encryption_key(struct fscrypt_info *ci)
 	}
 
 	err = fscrypt_setup_v1_file_key(ci, mk->mk_secret.raw);
+	if (err)
+		goto out_release_key;
+
+	*master_key_ret = key;
+	return 0;
 
 out_release_key:
+	up_read(&key->sem);
 	key_put(key);
 	return err;
 }
 
 static void put_crypt_info(struct fscrypt_info *ci)
 {
+	struct key *key;
+
 	if (!ci)
 		return;
 
@@ -267,6 +290,26 @@  static void put_crypt_info(struct fscrypt_info *ci)
 		crypto_free_skcipher(ci->ci_ctfm);
 		crypto_free_cipher(ci->ci_essiv_tfm);
 	}
+
+	key = ci->ci_master_key;
+	if (key) {
+		struct fscrypt_master_key *mk = key->payload.data[0];
+
+		/*
+		 * Remove this inode from the list of inodes that were unlocked
+		 * with the master key.
+		 *
+		 * In addition, if we're removing the last inode from a key that
+		 * already had its secret removed, invalidate the key so that it
+		 * gets removed from ->s_master_keys.
+		 */
+		spin_lock(&mk->mk_decrypted_inodes_lock);
+		list_del(&ci->ci_master_key_link);
+		spin_unlock(&mk->mk_decrypted_inodes_lock);
+		if (refcount_dec_and_test(&mk->mk_refcount))
+			key_invalidate(key);
+		key_put(key);
+	}
 	kmem_cache_free(fscrypt_info_cachep, ci);
 }
 
@@ -275,6 +318,7 @@  int fscrypt_get_encryption_info(struct inode *inode)
 	struct fscrypt_info *crypt_info;
 	struct fscrypt_context ctx;
 	struct fscrypt_mode *mode;
+	struct key *master_key = NULL;
 	int res;
 
 	if (fscrypt_has_encryption_key(inode))
@@ -335,13 +379,30 @@  int fscrypt_get_encryption_info(struct inode *inode)
 	WARN_ON(mode->ivsize > FSCRYPT_MAX_IV_SIZE);
 	crypt_info->ci_mode = mode;
 
-	res = setup_file_encryption_key(crypt_info);
+	res = setup_file_encryption_key(crypt_info, &master_key);
 	if (res)
 		goto out;
 
-	if (cmpxchg_release(&inode->i_crypt_info, NULL, crypt_info) == NULL)
+	if (cmpxchg_release(&inode->i_crypt_info, NULL, crypt_info) == NULL) {
+		if (master_key) {
+			struct fscrypt_master_key *mk =
+				master_key->payload.data[0];
+
+			refcount_inc(&mk->mk_refcount);
+			crypt_info->ci_master_key = key_get(master_key);
+			spin_lock(&mk->mk_decrypted_inodes_lock);
+			list_add(&crypt_info->ci_master_key_link,
+				 &mk->mk_decrypted_inodes);
+			spin_unlock(&mk->mk_decrypted_inodes_lock);
+		}
 		crypt_info = NULL;
+	}
+	res = 0;
 out:
+	if (master_key) {
+		up_read(&master_key->sem);
+		key_put(master_key);
+	}
 	if (res == -ENOKEY)
 		res = 0;
 	put_crypt_info(crypt_info);
@@ -376,3 +437,39 @@  void fscrypt_free_inode(struct inode *inode)
 	}
 }
 EXPORT_SYMBOL(fscrypt_free_inode);
+
+/**
+ * fscrypt_drop_inode - check whether the inode's master key has been removed
+ *
+ * Filesystems supporting fscrypt must call this from their ->drop_inode()
+ * method so that encrypted inodes are evicted as soon as they're no longer in
+ * use and their master key has been removed.
+ *
+ * Return: 1 if fscrypt wants the inode to be evicted now, otherwise 0
+ */
+int fscrypt_drop_inode(struct inode *inode)
+{
+	const struct fscrypt_info *ci = READ_ONCE(inode->i_crypt_info);
+	const struct fscrypt_master_key *mk;
+
+	/*
+	 * If ci is NULL, then the inode doesn't have an encryption key set up
+	 * so it's irrelevant.  If ci_master_key is NULL, then the master key
+	 * was provided via the legacy mechanism of the process-subscribed
+	 * keyrings, so we don't know whether it's been removed or not.
+	 */
+	if (!ci || !ci->ci_master_key)
+		return 0;
+	mk = ci->ci_master_key->payload.data[0];
+
+	/*
+	 * Note: since we aren't holding key->sem, the result here can
+	 * immediately become outdated.  But there's no correctness problem with
+	 * unnecessarily evicting.  Nor is there a correctness problem with not
+	 * evicting while iput() is racing with the key being removed, since
+	 * then the thread removing the key will either evict the inode itself
+	 * or will correctly detect that it wasn't evicted due to the race.
+	 */
+	return !is_master_key_secret_present(&mk->mk_secret);
+}
+EXPORT_SYMBOL(fscrypt_drop_inode);
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 46bf66cf76ef8..c1b80b981cec2 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -141,11 +141,13 @@  extern int fscrypt_inherit_context(struct inode *, struct inode *,
 /* keyring.c */
 extern void fscrypt_sb_free(struct super_block *sb);
 extern int fscrypt_ioctl_add_key(struct file *filp, void __user *arg);
+extern int fscrypt_ioctl_remove_key(struct file *filp, const void __user *arg);
 
 /* keysetup.c */
 extern int fscrypt_get_encryption_info(struct inode *);
 extern void fscrypt_put_encryption_info(struct inode *);
 extern void fscrypt_free_inode(struct inode *);
+extern int fscrypt_drop_inode(struct inode *inode);
 
 /* fname.c */
 extern int fscrypt_setup_filename(struct inode *, const struct qstr *,
@@ -381,6 +383,12 @@  static inline int fscrypt_ioctl_add_key(struct file *filp, void __user *arg)
 	return -EOPNOTSUPP;
 }
 
+static inline int fscrypt_ioctl_remove_key(struct file *filp,
+					   const void __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
 /* keysetup.c */
 static inline int fscrypt_get_encryption_info(struct inode *inode)
 {
@@ -396,6 +404,11 @@  static inline void fscrypt_free_inode(struct inode *inode)
 {
 }
 
+static inline int fscrypt_drop_inode(struct inode *inode)
+{
+	return 0;
+}
+
  /* fname.c */
 static inline int fscrypt_setup_filename(struct inode *dir,
 					 const struct qstr *iname,
diff --git a/include/uapi/linux/fscrypt.h b/include/uapi/linux/fscrypt.h
index 93d6eabaa7de4..cbe1ec46a4b56 100644
--- a/include/uapi/linux/fscrypt.h
+++ b/include/uapi/linux/fscrypt.h
@@ -67,10 +67,17 @@  struct fscrypt_add_key_arg {
 	__u8 raw[];
 };
 
+/* Struct passed to FS_IOC_REMOVE_ENCRYPTION_KEY */
+struct fscrypt_remove_key_arg {
+	struct fscrypt_key_specifier key_spec;
+	__u32 __reserved[6];
+};
+
 #define FS_IOC_SET_ENCRYPTION_POLICY	  _IOR('f', 19, struct fscrypt_policy)
 #define FS_IOC_GET_ENCRYPTION_PWSALT	  _IOW('f', 20, __u8[16])
 #define FS_IOC_GET_ENCRYPTION_POLICY	  _IOW('f', 21, struct fscrypt_policy)
 #define FS_IOC_ADD_ENCRYPTION_KEY	 _IOWR('f', 23, struct fscrypt_add_key_arg)
+#define FS_IOC_REMOVE_ENCRYPTION_KEY	  _IOW('f', 24, struct fscrypt_remove_key_arg)
 
 /**********************************************************************/