mbox series

[v6,0/9] Inline Encryption Support

Message ID 20191218145136.172774-1-satyat@google.com (mailing list archive)
Headers show
Series Inline Encryption Support | expand

Message

Satya Tangirala Dec. 18, 2019, 2:51 p.m. UTC
This patch series adds support for Inline Encryption to the block layer,
UFS, fscrypt, f2fs and ext4.

Note that the patches in this series for the block layer (i.e. patches 1, 2
and 3) can be applied independently of the subsequent patches in this
series.

Inline Encryption hardware allows software to specify an encryption context
(an encryption key, crypto algorithm, data unit num, data unit size, etc.)
along with a data transfer request to a storage device, and the inline
encryption hardware will use that context to en/decrypt the data. The
inline encryption hardware is part of the storage device, and it
conceptually sits on the data path between system memory and the storage
device. Inline Encryption hardware has become increasingly common, and we
want to support it in the kernel.

Inline Encryption hardware implementations often function around the
concept of a limited number of "keyslots", which can hold an encryption
context each. The storage device can be directed to en/decrypt any
particular request with the encryption context stored in any particular
keyslot.

Patch 1 introduces a Keyslot Manager to efficiently manage keyslots.
The keyslot manager also functions as the interface that blk-crypto
(introduced in Patch 3), will use to program keys into inline encryption
hardware. For more information on the Keyslot Manager, refer to
documentation found in block/keyslot-manager.c and linux/keyslot-manager.h.

Patch 2 introduces struct bio_crypt_ctx, and a ptr to one in struct bio,
which allows struct bio to represent an encryption context that can be
passed down the storage stack from the filesystem layer to the storage
driver.

Patch 3 introduces blk-crypto. Blk-crypto delegates crypto operations to
inline encryption hardware when available, and also contains a software
fallback to the kernel crypto API. Blk-crypto also makes it possible for
layered devices like device mapper to make use of inline encryption
hardware. Given that blk-crypto works as a software fallback, it may be
possible to remove file content en/decryption from fscrypt and simply use
blk-crypto in a future patch. For more details on blk-crypto, refer to
Documentation/block/inline-encryption.rst.

Patches 4-6 add support for inline encryption into the UFS driver according
to the JEDEC UFS HCI v2.1 specification. Inline encryption support for
other drivers (like eMMC) may be added in the same way - the device driver
should set up a Keyslot Manager in the device's request_queue (refer to
the UFS crypto additions in ufshcd-crypto.c and ufshcd.c for an example).

Patch 7 adds support to fscrypt - to use inline encryption with fscrypt,
the filesystem must be mounted with '-o inlinecrypt' - when this option is
specified, the contents of any AES-256-XTS encrypted file will be
encrypted using blk-crypto.

Patches 8 and 9 add support to f2fs and ext4 respectively, so that we have
a complete stack that can make use of inline encryption.

The patches were tested running kvm-xfstests, by specifying the introduced
"inlinecrypt" mount option, so that en/decryption happens with the
blk-crypto fallback. The patches were also tested on a Pixel 4 with UFS
hardware that has support for inline encryption.

There have been a few patch sets addressing Inline Encryption Support in
the past. Briefly, this patch set differs from those as follows:

1) "crypto: qce: ice: Add support for Inline Crypto Engine"
is specific to certain hardware, while our patch set's Inline
Encryption support for UFS is implemented according to the JEDEC UFS
specification.

2) "scsi: ufs: UFS Host Controller crypto changes" registers inline
encryption support as a kernel crypto algorithm. Our patch views inline
encryption as being fundamentally different from a generic crypto
provider (in that inline encryption is tied to a device), and so does
not use the kernel crypto API to represent inline encryption hardware.

3) "scsi: ufs: add real time/inline crypto support to UFS HCD" requires
the device mapper to work - our patch does not.

Changes v5 => v6:
 - Blk-crypto's kernel crypto API fallback is no longer restricted to
   8-byte DUNs. It's also now separately configurable from blk-crypto, and
   can be disabled entirely, while still allowing the kernel to use inline
   encryption hardware. Further, struct bio_crypt_ctx takes up less space,
   and no longer contains the information needed by the crypto API
   fallback - the fallback allocates the required memory when necessary.
 - Blk-crypto now supports all file content encryption modes supported by
   fscrypt.
 - Fixed bio merging logic in blk-merge.c
 - Fscrypt now supports inline encryption with the direct key policy, since
   blk-crypto now has support for larger DUNs.
 - Keyslot manager now uses a hashtable to lookup which keyslot contains
   any particular key (thanks Eric!)
 - Fscrypt support for inline encryption now handles filesystems with
   multiple underlying block devices (thanks Eric!)
 - Numerous cleanups

Changes v4 => v5:
 - The fscrypt patch has been separated into 2. The first adds support
   for the IV_INO_LBLK_64 policy (which was called INLINE_CRYPT_OPTIMIZED
   in past versions of this series). This policy is now purely an on disk
   format, and doesn't dictate whether blk-crypto is used for file content
   encryption or not. Instead, this is now decided based on the
   "inlinecrypt" mount option.
 - Inline crypto key eviction is now handled by blk-crypto instead of
   fscrypt.
 - More refactoring.

Changes v3 => v4:
 - Fixed the issue with allocating crypto_skcipher in
   blk_crypto_keyslot_program.
 - bio_crypto_alloc_ctx is now mempool backed.
 - In f2fs, a bio's bi_crypt_context is now set up when the
   bio is allocated, rather than just before the bio is
   submitted - this fixes bugs in certain cases, like when an
   encrypted block is being moved without decryption.
 - Lots of refactoring and cleanup of blk-crypto - thanks Eric!

Changes v2 => v3:
 - Overhauled keyslot manager's get keyslot logic and optimized LRU.
 - Block crypto en/decryption fallback now supports data unit sizes
   that divide the bvec length (instead of requiring each bvec's length
   to be the same as the data unit size).
 - fscrypt master key is now keyed additionally by super_block and
   ci_ctfm != NULL.
 - all references of "hw encryption" are replaced by inline encryption.
 - address various other review comments from Eric.

Changes v1 => v2:
 - Block layer and UFS changes are split into 3 patches each.
 - We now only have a ptr to a struct bio_crypt_ctx in struct bio, instead
   of the struct itself.
 - struct bio_crypt_ctx no longer has flags.
 - blk-crypto now correctly handles the case when it fails to init
   (because of insufficient memory), but kernel continues to boot.
 - ufshcd-crypto now works on big endian cpus.
 - Many cleanups.

Satya Tangirala (9):
  block: Keyslot Manager for Inline Encryption
  block: Add encryption context to struct bio
  block: blk-crypto for Inline Encryption
  scsi: ufs: UFS driver v2.1 spec crypto additions
  scsi: ufs: UFS crypto API
  scsi: ufs: Add inline encryption support to UFS
  fscrypt: add inline encryption support
  f2fs: add inline encryption support
  ext4: add inline encryption support

 Documentation/block/index.rst             |   1 +
 Documentation/block/inline-encryption.rst | 183 ++++++
 block/Kconfig                             |  17 +
 block/Makefile                            |   3 +
 block/bio-crypt-ctx.c                     | 140 +++++
 block/bio.c                               |  21 +-
 block/blk-core.c                          |  16 +-
 block/blk-crypto-fallback.c               | 648 ++++++++++++++++++++++
 block/blk-crypto-internal.h               |  58 ++
 block/blk-crypto.c                        | 242 ++++++++
 block/blk-merge.c                         |  11 +
 block/bounce.c                            |  12 +-
 block/keyslot-manager.c                   | 426 ++++++++++++++
 drivers/md/dm.c                           |   3 +-
 drivers/scsi/ufs/Kconfig                  |   9 +
 drivers/scsi/ufs/Makefile                 |   1 +
 drivers/scsi/ufs/ufshcd-crypto.c          | 391 +++++++++++++
 drivers/scsi/ufs/ufshcd-crypto.h          | 107 ++++
 drivers/scsi/ufs/ufshcd.c                 |  56 +-
 drivers/scsi/ufs/ufshcd.h                 |  25 +
 drivers/scsi/ufs/ufshci.h                 |  67 ++-
 fs/buffer.c                               |   2 +
 fs/crypto/Kconfig                         |   6 +
 fs/crypto/Makefile                        |   1 +
 fs/crypto/bio.c                           |  28 +-
 fs/crypto/crypto.c                        |   2 +-
 fs/crypto/fname.c                         |   4 +-
 fs/crypto/fscrypt_private.h               | 122 +++-
 fs/crypto/inline_crypt.c                  | 319 +++++++++++
 fs/crypto/keyring.c                       |   4 +-
 fs/crypto/keysetup.c                      | 102 ++--
 fs/crypto/keysetup_v1.c                   |  16 +-
 fs/ext4/ext4.h                            |   1 +
 fs/ext4/inode.c                           |   4 +-
 fs/ext4/page-io.c                         |   6 +-
 fs/ext4/readpage.c                        |  11 +-
 fs/ext4/super.c                           |  13 +
 fs/f2fs/data.c                            |  65 ++-
 fs/f2fs/f2fs.h                            |   3 +
 fs/f2fs/super.c                           |  41 ++
 include/linux/bio-crypt-ctx.h             | 193 +++++++
 include/linux/bio.h                       |   1 +
 include/linux/blk-crypto.h                |  63 +++
 include/linux/blk_types.h                 |   6 +
 include/linux/blkdev.h                    |   6 +
 include/linux/fscrypt.h                   |  58 ++
 include/linux/keyslot-manager.h           |  60 ++
 47 files changed, 3462 insertions(+), 112 deletions(-)
 create mode 100644 Documentation/block/inline-encryption.rst
 create mode 100644 block/bio-crypt-ctx.c
 create mode 100644 block/blk-crypto-fallback.c
 create mode 100644 block/blk-crypto-internal.h
 create mode 100644 block/blk-crypto.c
 create mode 100644 block/keyslot-manager.c
 create mode 100644 drivers/scsi/ufs/ufshcd-crypto.c
 create mode 100644 drivers/scsi/ufs/ufshcd-crypto.h
 create mode 100644 fs/crypto/inline_crypt.c
 create mode 100644 include/linux/bio-crypt-ctx.h
 create mode 100644 include/linux/blk-crypto.h
 create mode 100644 include/linux/keyslot-manager.h

Comments

Christoph Hellwig Jan. 8, 2020, 2:05 p.m. UTC | #1
I haven't been able to deep dive into the details, but the structure
of this still makes me very unhappy.

Most of it is related to the software fallback again.  Please split the
fallback into a separate file, and also into a separate data structure.
There is abslutely no need to have the overhead of the software only
fields for the hardware case.

On the counter side I think all the core block layer code added should
go into a single file instead of split into three with some odd
layering.

Also what I don't understand is why this managed key-slots on a per-bio
basis.  Wou;dn't it make a whole lot more sense to manage them on a
struct request basis once most of the merging has been performed?
Satya Tangirala Jan. 8, 2020, 6:43 p.m. UTC | #2
On Wed, Jan 08, 2020 at 06:05:56AM -0800, Christoph Hellwig wrote:
> I haven't been able to deep dive into the details, but the structure
> of this still makes me very unhappy.
> 
> Most of it is related to the software fallback again.  Please split the
> fallback into a separate file, and also into a separate data structure.
> There is abslutely no need to have the overhead of the software only
> fields for the hardware case.
> 
The fallback actually is in a separate file, and the software only fields
are not allocated in the hardware case anymore, either - I should have
made that clear(er) in the coverletter.
> On the counter side I think all the core block layer code added should
> go into a single file instead of split into three with some odd
> layering.
> 
Alright, I'll look into this. I still think that the keyslot manager
should maybe go in a separate file because it does a specific, fairly
self contained task and isn't just block layer code - it's the interface
between the device drivers and any upper layer.
> Also what I don't understand is why this managed key-slots on a per-bio
> basis.  Wou;dn't it make a whole lot more sense to manage them on a
> struct request basis once most of the merging has been performed?
I don't immediately see an issue with making it work on a struct request
basis. I'll look into this more carefully.

Thanks!
Satya
Christoph Hellwig Jan. 17, 2020, 8:52 a.m. UTC | #3
Hi Satya,

On Wed, Jan 08, 2020 at 10:43:05AM -0800, Satya Tangirala wrote:
> The fallback actually is in a separate file, and the software only fields
> are not allocated in the hardware case anymore, either - I should have
> made that clear(er) in the coverletter.

I see this now, thanks.  Either the changes weren't pushed to the
fscrypt report by the time I saw you mail, or I managed to look at a
stale local copy.

> Alright, I'll look into this. I still think that the keyslot manager
> should maybe go in a separate file because it does a specific, fairly
> self contained task and isn't just block layer code - it's the interface
> between the device drivers and any upper layer.

So are various other functions in the code like bio_crypt_clone or
bio_crypt_should_process.  Also the keyslot_* naming is way to generic,
it really needs a blk_ or blk_crypto_ prefix.

> > Also what I don't understand is why this managed key-slots on a per-bio
> > basis.  Wou;dn't it make a whole lot more sense to manage them on a
> > struct request basis once most of the merging has been performed?
> I don't immediately see an issue with making it work on a struct request
> basis. I'll look into this more carefully.

I think that should end up being simpler and more efficient.
Satya Tangirala Feb. 1, 2020, 12:53 a.m. UTC | #4
On Fri, Jan 17, 2020 at 12:52:10AM -0800, Christoph Hellwig wrote:
> Hi Satya,
> 
> On Wed, Jan 08, 2020 at 10:43:05AM -0800, Satya Tangirala wrote:
> > The fallback actually is in a separate file, and the software only fields
> > are not allocated in the hardware case anymore, either - I should have
> > made that clear(er) in the coverletter.
> 
> I see this now, thanks.  Either the changes weren't pushed to the
> fscrypt report by the time I saw you mail, or I managed to look at a
> stale local copy.
> 
> > Alright, I'll look into this. I still think that the keyslot manager
> > should maybe go in a separate file because it does a specific, fairly
> > self contained task and isn't just block layer code - it's the interface
> > between the device drivers and any upper layer.
> 
> So are various other functions in the code like bio_crypt_clone or
> bio_crypt_should_process.  Also the keyslot_* naming is way to generic,
> it really needs a blk_ or blk_crypto_ prefix.
> 
> > > Also what I don't understand is why this managed key-slots on a per-bio
> > > basis.  Wou;dn't it make a whole lot more sense to manage them on a
> > > struct request basis once most of the merging has been performed?
> > I don't immediately see an issue with making it work on a struct request
> > basis. I'll look into this more carefully.
> 
> I think that should end up being simpler and more efficient.
So I tried reading through more of blk-mq and the IO schedulers to figure
out how to do this. As far as I can tell, requests may be merged with
each other until they're taken off the scheduler. So ideally, we'd
program a keyslot for a request when it's taken off the scheduler, but
this happens from within an atomic context. Otoh, programming a keyslot
might cause the thread to sleep (in the event that all keyslots are in use
by other in-flight requests). Unless I'm missing something, or you had some
other different idea in mind, I think it's a lot easier to stick to letting
blk-crypto program keyslots and manage them per-bio...
Christoph Hellwig Feb. 3, 2020, 9:15 a.m. UTC | #5
On Fri, Jan 31, 2020 at 04:53:41PM -0800, Satya Tangirala wrote:
> So I tried reading through more of blk-mq and the IO schedulers to figure
> out how to do this. As far as I can tell, requests may be merged with
> each other until they're taken off the scheduler. So ideally, we'd
> program a keyslot for a request when it's taken off the scheduler, but
> this happens from within an atomic context. Otoh, programming a keyslot
> might cause the thread to sleep (in the event that all keyslots are in use
> by other in-flight requests). Unless I'm missing something, or you had some
> other different idea in mind, I think it's a lot easier to stick to letting
> blk-crypto program keyslots and manage them per-bio...

But as far as I understand from reading the code it only sleeps because
it waits for another key slot to be released.  Which is exactly like
any other resource constraint in the storage device.  In that case
->queue_rq returns BLK_STS_RESOURCE (or you do the equivalent in the
blk-mq code) and the queue gets woken again once the resource is
available.
Satya Tangirala Feb. 4, 2020, 3:39 a.m. UTC | #6
On Mon, Feb 03, 2020 at 01:15:58AM -0800, Christoph Hellwig wrote:
> On Fri, Jan 31, 2020 at 04:53:41PM -0800, Satya Tangirala wrote:
> > So I tried reading through more of blk-mq and the IO schedulers to figure
> > out how to do this. As far as I can tell, requests may be merged with
> > each other until they're taken off the scheduler. So ideally, we'd
> > program a keyslot for a request when it's taken off the scheduler, but
> > this happens from within an atomic context. Otoh, programming a keyslot
> > might cause the thread to sleep (in the event that all keyslots are in use
> > by other in-flight requests). Unless I'm missing something, or you had some
> > other different idea in mind, I think it's a lot easier to stick to letting
> > blk-crypto program keyslots and manage them per-bio...
> 
> But as far as I understand from reading the code it only sleeps because
> it waits for another key slot to be released.  Which is exactly like
> any other resource constraint in the storage device.  In that case
> ->queue_rq returns BLK_STS_RESOURCE (or you do the equivalent in the
> blk-mq code) and the queue gets woken again once the resource is
> available.
Wouldn't that mean that all the other requests in the queue, even ones that
don't even need any inline encryption, also don't get processed until the
queue is woken up again? And if so, are we really ok with that?

As you said, we'd need the queue to wake up once a keyslot is available.
It's possible that only some hardware queues and not others get blocked
because of keyslot programming, so ideally, we could somehow make the
correct hardware queue(s) wake up once a keyslot is freed. But the keyslot
manager can't assume that it's actually blk-mq that's being used
underneath, so if we want to get the keyslot manager to do something once
a keyslot was freed, it would need some generic way to signal that to
blk-mq. We can also just wait around for the queue to restart by itself
after some time delay and try to program the keyslot again at that point,
although I wouldn't want to do that because in the current design we know
exactly when a keyslot is freed, and we don't need to rely on potentially
inefficient guesswork about when we can successfully program a keyslot.
Maybe we're alright with waking up all the queues rather than only the
ones that really need it? But in any case, I don't know yet what the
best way to solve this problem is.

We would also need to make changes to handle programming keyslots in
some of the other make_request_fns besides blk_mq_make_request too
(wherever relevant, at least) which adds more complexity. Overall, it seems
to me like trying to manage programming of keyslots on a per-request basis
is maybe more code than what we have now, and I'm not sure what we're
really buying by doing it (other than perhaps the performance benefit of
having to get fewer refcounts on a variable and fewer comparisions of
cryptographic keys).

Also I forgot to mention this in my previous mail, but there may be some
drivers/devices whose keyslots cannot be programmed from an atomic context,
so this approach which might make things difficult in those situations (the
UFS v2.1 spec, which I followed while implementing support for inline
crypto for UFS, does not care whether we're in an atomic context or not,
but there might be specifications for other drivers, or even some
particular UFS inline encryption hardware that do).

So unless you have strong objections, I'd want to continue programming
keyslots per-bio for the above reasons.
Christoph Hellwig Feb. 4, 2020, 2:58 p.m. UTC | #7
On Mon, Feb 03, 2020 at 07:39:15PM -0800, Satya Tangirala wrote:
> Wouldn't that mean that all the other requests in the queue, even ones that
> don't even need any inline encryption, also don't get processed until the
> queue is woken up again?

For the basic implementation yes.

> And if so, are we really ok with that?

That depends on the use cases.  With the fscrypt setup are we still
going to see unencrypted I/O to the device as well?  If so we'll need
to refine the setup and only queue up unencrypted requests.  But I'd
still try to dumb version first and then refine it.

> As you said, we'd need the queue to wake up once a keyslot is available.
> It's possible that only some hardware queues and not others get blocked
> because of keyslot programming, so ideally, we could somehow make the
> correct hardware queue(s) wake up once a keyslot is freed. But the keyslot
> manager can't assume that it's actually blk-mq that's being used
> underneath,

Why?  The legacy requet code is long gone.

> Also I forgot to mention this in my previous mail, but there may be some
> drivers/devices whose keyslots cannot be programmed from an atomic context,
> so this approach which might make things difficult in those situations (the
> UFS v2.1 spec, which I followed while implementing support for inline
> crypto for UFS, does not care whether we're in an atomic context or not,
> but there might be specifications for other drivers, or even some
> particular UFS inline encryption hardware that do).

We have an option to never call ->queue_rq from atomic context
(BLK_MQ_F_BLOCKING).  But do you know of existing hardware that behaves
like this or is it just hypothetical?

> So unless you have strong objections, I'd want to continue programming
> keyslots per-bio for the above reasons.

I'm pretty sure from looking at the code that doing inline encryption
at the bio level is the wrong approach.  That isn't supposed to end
the discussion, but especially things like waking up after a keyslot
becomes available fits much better into the request layer resource
model that is built around queuing limitations, and not the make_request
model that assumes the driver can always queue.
Eric Biggers Feb. 4, 2020, 9:21 p.m. UTC | #8
On Tue, Feb 04, 2020 at 06:58:32AM -0800, Christoph Hellwig wrote:
> On Mon, Feb 03, 2020 at 07:39:15PM -0800, Satya Tangirala wrote:
> > Wouldn't that mean that all the other requests in the queue, even ones that
> > don't even need any inline encryption, also don't get processed until the
> > queue is woken up again?
> 
> For the basic implementation yes.
> 
> > And if so, are we really ok with that?
> 
> That depends on the use cases.  With the fscrypt setup are we still
> going to see unencrypted I/O to the device as well?  If so we'll need
> to refine the setup and only queue up unencrypted requests.  But I'd
> still try to dumb version first and then refine it.

Definitely, for several reasons:

- Not all files on the filesystem are necessarily encrypted.
- Filesystem metadata is not encrypted (except for filenames, but those don't
  use inline encryption).
- Encryption isn't necessarily being used on all partitions on the disk.

It's also not just about unencrypted vs. encrypted, since just because someone
is waiting for one keyslot doesn't mean we should pause all encrypted I/O to the
device for all keyslots.

> 
> > As you said, we'd need the queue to wake up once a keyslot is available.
> > It's possible that only some hardware queues and not others get blocked
> > because of keyslot programming, so ideally, we could somehow make the
> > correct hardware queue(s) wake up once a keyslot is freed. But the keyslot
> > manager can't assume that it's actually blk-mq that's being used
> > underneath,
> 
> Why?  The legacy requet code is long gone.
> 
> > Also I forgot to mention this in my previous mail, but there may be some
> > drivers/devices whose keyslots cannot be programmed from an atomic context,
> > so this approach which might make things difficult in those situations (the
> > UFS v2.1 spec, which I followed while implementing support for inline
> > crypto for UFS, does not care whether we're in an atomic context or not,
> > but there might be specifications for other drivers, or even some
> > particular UFS inline encryption hardware that do).
> 
> We have an option to never call ->queue_rq from atomic context
> (BLK_MQ_F_BLOCKING).  But do you know of existing hardware that behaves
> like this or is it just hypothetical?

Maybe -- check the Qualcomm ICE (Inline Crypto Engine) driver I posted at
https://lkml.kernel.org/linux-block/20200110061634.46742-1-ebiggers@kernel.org/.
The hardware requires vendor-specific SMC calls to program keys, rather than the
UFS standard way.  It's currently blocking, since the code to make the SMC calls
in drivers/firmware/qcom_scm*.c uses GFP_KERNEL and mutex_lock().

I'll test whether it can work in atomic context by using GFP_ATOMIC and
qcom_scm_call_atomic() instead.  (Adding a spinlock might be needed too.)

- Eric
Eric Biggers Feb. 5, 2020, 7:36 a.m. UTC | #9
On Tue, Feb 04, 2020 at 01:21:11PM -0800, Eric Biggers wrote:
> On Tue, Feb 04, 2020 at 06:58:32AM -0800, Christoph Hellwig wrote:
> > On Mon, Feb 03, 2020 at 07:39:15PM -0800, Satya Tangirala wrote:
> > > Wouldn't that mean that all the other requests in the queue, even ones that
> > > don't even need any inline encryption, also don't get processed until the
> > > queue is woken up again?
> > 
> > For the basic implementation yes.
> > 
> > > And if so, are we really ok with that?
> > 
> > That depends on the use cases.  With the fscrypt setup are we still
> > going to see unencrypted I/O to the device as well?  If so we'll need
> > to refine the setup and only queue up unencrypted requests.  But I'd
> > still try to dumb version first and then refine it.
> 
> Definitely, for several reasons:
> 
> - Not all files on the filesystem are necessarily encrypted.
> - Filesystem metadata is not encrypted (except for filenames, but those don't
>   use inline encryption).
> - Encryption isn't necessarily being used on all partitions on the disk.
> 
> It's also not just about unencrypted vs. encrypted, since just because someone
> is waiting for one keyslot doesn't mean we should pause all encrypted I/O to the
> device for all keyslots.
> 
> > 
> > > As you said, we'd need the queue to wake up once a keyslot is available.
> > > It's possible that only some hardware queues and not others get blocked
> > > because of keyslot programming, so ideally, we could somehow make the
> > > correct hardware queue(s) wake up once a keyslot is freed. But the keyslot
> > > manager can't assume that it's actually blk-mq that's being used
> > > underneath,
> > 
> > Why?  The legacy requet code is long gone.
> > 
> > > Also I forgot to mention this in my previous mail, but there may be some
> > > drivers/devices whose keyslots cannot be programmed from an atomic context,
> > > so this approach which might make things difficult in those situations (the
> > > UFS v2.1 spec, which I followed while implementing support for inline
> > > crypto for UFS, does not care whether we're in an atomic context or not,
> > > but there might be specifications for other drivers, or even some
> > > particular UFS inline encryption hardware that do).
> > 
> > We have an option to never call ->queue_rq from atomic context
> > (BLK_MQ_F_BLOCKING).  But do you know of existing hardware that behaves
> > like this or is it just hypothetical?
> 
> Maybe -- check the Qualcomm ICE (Inline Crypto Engine) driver I posted at
> https://lkml.kernel.org/linux-block/20200110061634.46742-1-ebiggers@kernel.org/.
> The hardware requires vendor-specific SMC calls to program keys, rather than the
> UFS standard way.  It's currently blocking, since the code to make the SMC calls
> in drivers/firmware/qcom_scm*.c uses GFP_KERNEL and mutex_lock().
> 
> I'll test whether it can work in atomic context by using GFP_ATOMIC and
> qcom_scm_call_atomic() instead.  (Adding a spinlock might be needed too.)
> 

The vendor-specific SMC calls do seem to work in atomic context, at least on
SDA845.  However, in ufshcd_program_key(), the calls to pm_runtime_get_sync()
and ufshcd_hold() can also sleep.

I think we can move the pm_runtime_get_sync() to ufshcd_crypto_keyslot_evict(),
since the block layer already ensures the device is not runtime-suspended while
requests are being processed (see blk_queue_enter()).  I.e., keyslots can be
evicted independently of any bio, but that's not the case for programming them.

That still leaves ufshcd_hold(), which is still needed to ungate the UFS clocks.
It does accept an 'async' argument, which is used by ufshcd_queuecommand() to
schedule work to ungate the clocks and return SCSI_MLQUEUE_HOST_BUSY.

So in blk_mq_dispatch_rq_list(), we could potentially try to acquire the
keyslot, and if it can't be done because either none are available or because
something else needs to be waited for, we can put the request back on the
dispatch list -- similar to how failure to get a driver tag is handled.

However, if I understand correctly, that would mean that all requests to the
same hardware queue would be blocked whenever someone is waiting for a keyslot
-- even unencrypted requests and requests for unrelated keyslots.

It's possible that would still be fine for the Android use case, as vendors tend
to add enough keyslots to work with Android, provided that they choose the
fscrypt format that uses one key per encryption policy rather than one key per
file.  I.e., it might be the case that no one waits for keyslots in practice
anyway.  But, it seems it would be undesirable for a general Linux kernel
framework, which could potentially be used with per-file keys or with hardware
that only has a *very* small number of keyslots.

Another option would be to allocate the keyslot in blk_mq_get_request(), where
sleeping is still allowed, but some merging was already done.

- Eric
Christoph Hellwig Feb. 5, 2020, 6:05 p.m. UTC | #10
On Tue, Feb 04, 2020 at 11:36:01PM -0800, Eric Biggers wrote:
> The vendor-specific SMC calls do seem to work in atomic context, at least on
> SDA845.  However, in ufshcd_program_key(), the calls to pm_runtime_get_sync()
> and ufshcd_hold() can also sleep.
> 
> I think we can move the pm_runtime_get_sync() to ufshcd_crypto_keyslot_evict(),
> since the block layer already ensures the device is not runtime-suspended while
> requests are being processed (see blk_queue_enter()).  I.e., keyslots can be
> evicted independently of any bio, but that's not the case for programming them.

Yes.

> That still leaves ufshcd_hold(), which is still needed to ungate the UFS clocks.
> It does accept an 'async' argument, which is used by ufshcd_queuecommand() to
> schedule work to ungate the clocks and return SCSI_MLQUEUE_HOST_BUSY.
> 
> So in blk_mq_dispatch_rq_list(), we could potentially try to acquire the
> keyslot, and if it can't be done because either none are available or because
> something else needs to be waited for, we can put the request back on the
> dispatch list -- similar to how failure to get a driver tag is handled.

Yes, that is what I had in mind.

> However, if I understand correctly, that would mean that all requests to the
> same hardware queue would be blocked whenever someone is waiting for a keyslot
> -- even unencrypted requests and requests for unrelated keyslots.

At least for an initial dumb implementation, yes.  But if we care enough
we can improve the code to check for the encrypted flag and only put
back encrypted flags in that case.

> It's possible that would still be fine for the Android use case, as vendors tend
> to add enough keyslots to work with Android, provided that they choose the
> fscrypt format that uses one key per encryption policy rather than one key per
> file.  I.e., it might be the case that no one waits for keyslots in practice
> anyway.  But, it seems it would be undesirable for a general Linux kernel
> framework, which could potentially be used with per-file keys or with hardware
> that only has a *very* small number of keyslots.
> 
> Another option would be to allocate the keyslot in blk_mq_get_request(), where
> sleeping is still allowed, but some merging was already done.

That is another good idea.  In blk_mq_get_request we acquire other
resources like the tag, so this would be a very logical places to
acquire the key slots.  We can should also be able to still merge into
the request while it is waiting.
Satya Tangirala Feb. 21, 2020, 12:30 p.m. UTC | #11
Hi Christoph,

I sent out v7 of the patch series. It's at
https://lore.kernel.org/linux-fscrypt/20200221115050.238976-1-satyat@google.com/T/#t

It manages keyslots on a per-request basis now - I made it get keyslots
in blk_mq_get_request, because that way I wouldn't have to worry about
programming keys in an atomic context. What do you think of the new
approach?

On Wed, Feb 05, 2020 at 10:05:41AM -0800, Christoph Hellwig wrote:
> On Tue, Feb 04, 2020 at 11:36:01PM -0800, Eric Biggers wrote:
> > The vendor-specific SMC calls do seem to work in atomic context, at least on
> > SDA845.  However, in ufshcd_program_key(), the calls to pm_runtime_get_sync()
> > and ufshcd_hold() can also sleep.
> > 
> > I think we can move the pm_runtime_get_sync() to ufshcd_crypto_keyslot_evict(),
> > since the block layer already ensures the device is not runtime-suspended while
> > requests are being processed (see blk_queue_enter()).  I.e., keyslots can be
> > evicted independently of any bio, but that's not the case for programming them.
> 
> Yes.
> 
> > That still leaves ufshcd_hold(), which is still needed to ungate the UFS clocks.
> > It does accept an 'async' argument, which is used by ufshcd_queuecommand() to
> > schedule work to ungate the clocks and return SCSI_MLQUEUE_HOST_BUSY.
> > 
> > So in blk_mq_dispatch_rq_list(), we could potentially try to acquire the
> > keyslot, and if it can't be done because either none are available or because
> > something else needs to be waited for, we can put the request back on the
> > dispatch list -- similar to how failure to get a driver tag is handled.
> 
> Yes, that is what I had in mind.
> 
> > However, if I understand correctly, that would mean that all requests to the
> > same hardware queue would be blocked whenever someone is waiting for a keyslot
> > -- even unencrypted requests and requests for unrelated keyslots.
> 
> At least for an initial dumb implementation, yes.  But if we care enough
> we can improve the code to check for the encrypted flag and only put
> back encrypted flags in that case.
> 
> > It's possible that would still be fine for the Android use case, as vendors tend
> > to add enough keyslots to work with Android, provided that they choose the
> > fscrypt format that uses one key per encryption policy rather than one key per
> > file.  I.e., it might be the case that no one waits for keyslots in practice
> > anyway.  But, it seems it would be undesirable for a general Linux kernel
> > framework, which could potentially be used with per-file keys or with hardware
> > that only has a *very* small number of keyslots.
> > 
> > Another option would be to allocate the keyslot in blk_mq_get_request(), where
> > sleeping is still allowed, but some merging was already done.
> 
> That is another good idea.  In blk_mq_get_request we acquire other
> resources like the tag, so this would be a very logical places to
> acquire the key slots.  We can should also be able to still merge into
> the request while it is waiting.
Christoph Hellwig Feb. 21, 2020, 2:20 p.m. UTC | #12
On Fri, Feb 21, 2020 at 04:30:30AM -0800, Satya Tangirala wrote:
> Hi Christoph,
> 
> I sent out v7 of the patch series. It's at
> https://lore.kernel.org/linux-fscrypt/20200221115050.238976-1-satyat@google.com/T/#t
> 
> It manages keyslots on a per-request basis now - I made it get keyslots
> in blk_mq_get_request, because that way I wouldn't have to worry about
> programming keys in an atomic context. What do you think of the new
> approach?

I'll try to review it soon, thanks!