Message ID | 20191218145136.172774-1-satyat@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Inline Encryption Support | expand |
I haven't been able to deep dive into the details, but the structure of this still makes me very unhappy. Most of it is related to the software fallback again. Please split the fallback into a separate file, and also into a separate data structure. There is abslutely no need to have the overhead of the software only fields for the hardware case. On the counter side I think all the core block layer code added should go into a single file instead of split into three with some odd layering. Also what I don't understand is why this managed key-slots on a per-bio basis. Wou;dn't it make a whole lot more sense to manage them on a struct request basis once most of the merging has been performed?
On Wed, Jan 08, 2020 at 06:05:56AM -0800, Christoph Hellwig wrote: > I haven't been able to deep dive into the details, but the structure > of this still makes me very unhappy. > > Most of it is related to the software fallback again. Please split the > fallback into a separate file, and also into a separate data structure. > There is abslutely no need to have the overhead of the software only > fields for the hardware case. > The fallback actually is in a separate file, and the software only fields are not allocated in the hardware case anymore, either - I should have made that clear(er) in the coverletter. > On the counter side I think all the core block layer code added should > go into a single file instead of split into three with some odd > layering. > Alright, I'll look into this. I still think that the keyslot manager should maybe go in a separate file because it does a specific, fairly self contained task and isn't just block layer code - it's the interface between the device drivers and any upper layer. > Also what I don't understand is why this managed key-slots on a per-bio > basis. Wou;dn't it make a whole lot more sense to manage them on a > struct request basis once most of the merging has been performed? I don't immediately see an issue with making it work on a struct request basis. I'll look into this more carefully. Thanks! Satya
Hi Satya, On Wed, Jan 08, 2020 at 10:43:05AM -0800, Satya Tangirala wrote: > The fallback actually is in a separate file, and the software only fields > are not allocated in the hardware case anymore, either - I should have > made that clear(er) in the coverletter. I see this now, thanks. Either the changes weren't pushed to the fscrypt report by the time I saw you mail, or I managed to look at a stale local copy. > Alright, I'll look into this. I still think that the keyslot manager > should maybe go in a separate file because it does a specific, fairly > self contained task and isn't just block layer code - it's the interface > between the device drivers and any upper layer. So are various other functions in the code like bio_crypt_clone or bio_crypt_should_process. Also the keyslot_* naming is way to generic, it really needs a blk_ or blk_crypto_ prefix. > > Also what I don't understand is why this managed key-slots on a per-bio > > basis. Wou;dn't it make a whole lot more sense to manage them on a > > struct request basis once most of the merging has been performed? > I don't immediately see an issue with making it work on a struct request > basis. I'll look into this more carefully. I think that should end up being simpler and more efficient.
On Fri, Jan 17, 2020 at 12:52:10AM -0800, Christoph Hellwig wrote: > Hi Satya, > > On Wed, Jan 08, 2020 at 10:43:05AM -0800, Satya Tangirala wrote: > > The fallback actually is in a separate file, and the software only fields > > are not allocated in the hardware case anymore, either - I should have > > made that clear(er) in the coverletter. > > I see this now, thanks. Either the changes weren't pushed to the > fscrypt report by the time I saw you mail, or I managed to look at a > stale local copy. > > > Alright, I'll look into this. I still think that the keyslot manager > > should maybe go in a separate file because it does a specific, fairly > > self contained task and isn't just block layer code - it's the interface > > between the device drivers and any upper layer. > > So are various other functions in the code like bio_crypt_clone or > bio_crypt_should_process. Also the keyslot_* naming is way to generic, > it really needs a blk_ or blk_crypto_ prefix. > > > > Also what I don't understand is why this managed key-slots on a per-bio > > > basis. Wou;dn't it make a whole lot more sense to manage them on a > > > struct request basis once most of the merging has been performed? > > I don't immediately see an issue with making it work on a struct request > > basis. I'll look into this more carefully. > > I think that should end up being simpler and more efficient. So I tried reading through more of blk-mq and the IO schedulers to figure out how to do this. As far as I can tell, requests may be merged with each other until they're taken off the scheduler. So ideally, we'd program a keyslot for a request when it's taken off the scheduler, but this happens from within an atomic context. Otoh, programming a keyslot might cause the thread to sleep (in the event that all keyslots are in use by other in-flight requests). Unless I'm missing something, or you had some other different idea in mind, I think it's a lot easier to stick to letting blk-crypto program keyslots and manage them per-bio...
On Fri, Jan 31, 2020 at 04:53:41PM -0800, Satya Tangirala wrote: > So I tried reading through more of blk-mq and the IO schedulers to figure > out how to do this. As far as I can tell, requests may be merged with > each other until they're taken off the scheduler. So ideally, we'd > program a keyslot for a request when it's taken off the scheduler, but > this happens from within an atomic context. Otoh, programming a keyslot > might cause the thread to sleep (in the event that all keyslots are in use > by other in-flight requests). Unless I'm missing something, or you had some > other different idea in mind, I think it's a lot easier to stick to letting > blk-crypto program keyslots and manage them per-bio... But as far as I understand from reading the code it only sleeps because it waits for another key slot to be released. Which is exactly like any other resource constraint in the storage device. In that case ->queue_rq returns BLK_STS_RESOURCE (or you do the equivalent in the blk-mq code) and the queue gets woken again once the resource is available.
On Mon, Feb 03, 2020 at 01:15:58AM -0800, Christoph Hellwig wrote: > On Fri, Jan 31, 2020 at 04:53:41PM -0800, Satya Tangirala wrote: > > So I tried reading through more of blk-mq and the IO schedulers to figure > > out how to do this. As far as I can tell, requests may be merged with > > each other until they're taken off the scheduler. So ideally, we'd > > program a keyslot for a request when it's taken off the scheduler, but > > this happens from within an atomic context. Otoh, programming a keyslot > > might cause the thread to sleep (in the event that all keyslots are in use > > by other in-flight requests). Unless I'm missing something, or you had some > > other different idea in mind, I think it's a lot easier to stick to letting > > blk-crypto program keyslots and manage them per-bio... > > But as far as I understand from reading the code it only sleeps because > it waits for another key slot to be released. Which is exactly like > any other resource constraint in the storage device. In that case > ->queue_rq returns BLK_STS_RESOURCE (or you do the equivalent in the > blk-mq code) and the queue gets woken again once the resource is > available. Wouldn't that mean that all the other requests in the queue, even ones that don't even need any inline encryption, also don't get processed until the queue is woken up again? And if so, are we really ok with that? As you said, we'd need the queue to wake up once a keyslot is available. It's possible that only some hardware queues and not others get blocked because of keyslot programming, so ideally, we could somehow make the correct hardware queue(s) wake up once a keyslot is freed. But the keyslot manager can't assume that it's actually blk-mq that's being used underneath, so if we want to get the keyslot manager to do something once a keyslot was freed, it would need some generic way to signal that to blk-mq. We can also just wait around for the queue to restart by itself after some time delay and try to program the keyslot again at that point, although I wouldn't want to do that because in the current design we know exactly when a keyslot is freed, and we don't need to rely on potentially inefficient guesswork about when we can successfully program a keyslot. Maybe we're alright with waking up all the queues rather than only the ones that really need it? But in any case, I don't know yet what the best way to solve this problem is. We would also need to make changes to handle programming keyslots in some of the other make_request_fns besides blk_mq_make_request too (wherever relevant, at least) which adds more complexity. Overall, it seems to me like trying to manage programming of keyslots on a per-request basis is maybe more code than what we have now, and I'm not sure what we're really buying by doing it (other than perhaps the performance benefit of having to get fewer refcounts on a variable and fewer comparisions of cryptographic keys). Also I forgot to mention this in my previous mail, but there may be some drivers/devices whose keyslots cannot be programmed from an atomic context, so this approach which might make things difficult in those situations (the UFS v2.1 spec, which I followed while implementing support for inline crypto for UFS, does not care whether we're in an atomic context or not, but there might be specifications for other drivers, or even some particular UFS inline encryption hardware that do). So unless you have strong objections, I'd want to continue programming keyslots per-bio for the above reasons.
On Mon, Feb 03, 2020 at 07:39:15PM -0800, Satya Tangirala wrote: > Wouldn't that mean that all the other requests in the queue, even ones that > don't even need any inline encryption, also don't get processed until the > queue is woken up again? For the basic implementation yes. > And if so, are we really ok with that? That depends on the use cases. With the fscrypt setup are we still going to see unencrypted I/O to the device as well? If so we'll need to refine the setup and only queue up unencrypted requests. But I'd still try to dumb version first and then refine it. > As you said, we'd need the queue to wake up once a keyslot is available. > It's possible that only some hardware queues and not others get blocked > because of keyslot programming, so ideally, we could somehow make the > correct hardware queue(s) wake up once a keyslot is freed. But the keyslot > manager can't assume that it's actually blk-mq that's being used > underneath, Why? The legacy requet code is long gone. > Also I forgot to mention this in my previous mail, but there may be some > drivers/devices whose keyslots cannot be programmed from an atomic context, > so this approach which might make things difficult in those situations (the > UFS v2.1 spec, which I followed while implementing support for inline > crypto for UFS, does not care whether we're in an atomic context or not, > but there might be specifications for other drivers, or even some > particular UFS inline encryption hardware that do). We have an option to never call ->queue_rq from atomic context (BLK_MQ_F_BLOCKING). But do you know of existing hardware that behaves like this or is it just hypothetical? > So unless you have strong objections, I'd want to continue programming > keyslots per-bio for the above reasons. I'm pretty sure from looking at the code that doing inline encryption at the bio level is the wrong approach. That isn't supposed to end the discussion, but especially things like waking up after a keyslot becomes available fits much better into the request layer resource model that is built around queuing limitations, and not the make_request model that assumes the driver can always queue.
On Tue, Feb 04, 2020 at 06:58:32AM -0800, Christoph Hellwig wrote: > On Mon, Feb 03, 2020 at 07:39:15PM -0800, Satya Tangirala wrote: > > Wouldn't that mean that all the other requests in the queue, even ones that > > don't even need any inline encryption, also don't get processed until the > > queue is woken up again? > > For the basic implementation yes. > > > And if so, are we really ok with that? > > That depends on the use cases. With the fscrypt setup are we still > going to see unencrypted I/O to the device as well? If so we'll need > to refine the setup and only queue up unencrypted requests. But I'd > still try to dumb version first and then refine it. Definitely, for several reasons: - Not all files on the filesystem are necessarily encrypted. - Filesystem metadata is not encrypted (except for filenames, but those don't use inline encryption). - Encryption isn't necessarily being used on all partitions on the disk. It's also not just about unencrypted vs. encrypted, since just because someone is waiting for one keyslot doesn't mean we should pause all encrypted I/O to the device for all keyslots. > > > As you said, we'd need the queue to wake up once a keyslot is available. > > It's possible that only some hardware queues and not others get blocked > > because of keyslot programming, so ideally, we could somehow make the > > correct hardware queue(s) wake up once a keyslot is freed. But the keyslot > > manager can't assume that it's actually blk-mq that's being used > > underneath, > > Why? The legacy requet code is long gone. > > > Also I forgot to mention this in my previous mail, but there may be some > > drivers/devices whose keyslots cannot be programmed from an atomic context, > > so this approach which might make things difficult in those situations (the > > UFS v2.1 spec, which I followed while implementing support for inline > > crypto for UFS, does not care whether we're in an atomic context or not, > > but there might be specifications for other drivers, or even some > > particular UFS inline encryption hardware that do). > > We have an option to never call ->queue_rq from atomic context > (BLK_MQ_F_BLOCKING). But do you know of existing hardware that behaves > like this or is it just hypothetical? Maybe -- check the Qualcomm ICE (Inline Crypto Engine) driver I posted at https://lkml.kernel.org/linux-block/20200110061634.46742-1-ebiggers@kernel.org/. The hardware requires vendor-specific SMC calls to program keys, rather than the UFS standard way. It's currently blocking, since the code to make the SMC calls in drivers/firmware/qcom_scm*.c uses GFP_KERNEL and mutex_lock(). I'll test whether it can work in atomic context by using GFP_ATOMIC and qcom_scm_call_atomic() instead. (Adding a spinlock might be needed too.) - Eric
On Tue, Feb 04, 2020 at 01:21:11PM -0800, Eric Biggers wrote: > On Tue, Feb 04, 2020 at 06:58:32AM -0800, Christoph Hellwig wrote: > > On Mon, Feb 03, 2020 at 07:39:15PM -0800, Satya Tangirala wrote: > > > Wouldn't that mean that all the other requests in the queue, even ones that > > > don't even need any inline encryption, also don't get processed until the > > > queue is woken up again? > > > > For the basic implementation yes. > > > > > And if so, are we really ok with that? > > > > That depends on the use cases. With the fscrypt setup are we still > > going to see unencrypted I/O to the device as well? If so we'll need > > to refine the setup and only queue up unencrypted requests. But I'd > > still try to dumb version first and then refine it. > > Definitely, for several reasons: > > - Not all files on the filesystem are necessarily encrypted. > - Filesystem metadata is not encrypted (except for filenames, but those don't > use inline encryption). > - Encryption isn't necessarily being used on all partitions on the disk. > > It's also not just about unencrypted vs. encrypted, since just because someone > is waiting for one keyslot doesn't mean we should pause all encrypted I/O to the > device for all keyslots. > > > > > > As you said, we'd need the queue to wake up once a keyslot is available. > > > It's possible that only some hardware queues and not others get blocked > > > because of keyslot programming, so ideally, we could somehow make the > > > correct hardware queue(s) wake up once a keyslot is freed. But the keyslot > > > manager can't assume that it's actually blk-mq that's being used > > > underneath, > > > > Why? The legacy requet code is long gone. > > > > > Also I forgot to mention this in my previous mail, but there may be some > > > drivers/devices whose keyslots cannot be programmed from an atomic context, > > > so this approach which might make things difficult in those situations (the > > > UFS v2.1 spec, which I followed while implementing support for inline > > > crypto for UFS, does not care whether we're in an atomic context or not, > > > but there might be specifications for other drivers, or even some > > > particular UFS inline encryption hardware that do). > > > > We have an option to never call ->queue_rq from atomic context > > (BLK_MQ_F_BLOCKING). But do you know of existing hardware that behaves > > like this or is it just hypothetical? > > Maybe -- check the Qualcomm ICE (Inline Crypto Engine) driver I posted at > https://lkml.kernel.org/linux-block/20200110061634.46742-1-ebiggers@kernel.org/. > The hardware requires vendor-specific SMC calls to program keys, rather than the > UFS standard way. It's currently blocking, since the code to make the SMC calls > in drivers/firmware/qcom_scm*.c uses GFP_KERNEL and mutex_lock(). > > I'll test whether it can work in atomic context by using GFP_ATOMIC and > qcom_scm_call_atomic() instead. (Adding a spinlock might be needed too.) > The vendor-specific SMC calls do seem to work in atomic context, at least on SDA845. However, in ufshcd_program_key(), the calls to pm_runtime_get_sync() and ufshcd_hold() can also sleep. I think we can move the pm_runtime_get_sync() to ufshcd_crypto_keyslot_evict(), since the block layer already ensures the device is not runtime-suspended while requests are being processed (see blk_queue_enter()). I.e., keyslots can be evicted independently of any bio, but that's not the case for programming them. That still leaves ufshcd_hold(), which is still needed to ungate the UFS clocks. It does accept an 'async' argument, which is used by ufshcd_queuecommand() to schedule work to ungate the clocks and return SCSI_MLQUEUE_HOST_BUSY. So in blk_mq_dispatch_rq_list(), we could potentially try to acquire the keyslot, and if it can't be done because either none are available or because something else needs to be waited for, we can put the request back on the dispatch list -- similar to how failure to get a driver tag is handled. However, if I understand correctly, that would mean that all requests to the same hardware queue would be blocked whenever someone is waiting for a keyslot -- even unencrypted requests and requests for unrelated keyslots. It's possible that would still be fine for the Android use case, as vendors tend to add enough keyslots to work with Android, provided that they choose the fscrypt format that uses one key per encryption policy rather than one key per file. I.e., it might be the case that no one waits for keyslots in practice anyway. But, it seems it would be undesirable for a general Linux kernel framework, which could potentially be used with per-file keys or with hardware that only has a *very* small number of keyslots. Another option would be to allocate the keyslot in blk_mq_get_request(), where sleeping is still allowed, but some merging was already done. - Eric
On Tue, Feb 04, 2020 at 11:36:01PM -0800, Eric Biggers wrote: > The vendor-specific SMC calls do seem to work in atomic context, at least on > SDA845. However, in ufshcd_program_key(), the calls to pm_runtime_get_sync() > and ufshcd_hold() can also sleep. > > I think we can move the pm_runtime_get_sync() to ufshcd_crypto_keyslot_evict(), > since the block layer already ensures the device is not runtime-suspended while > requests are being processed (see blk_queue_enter()). I.e., keyslots can be > evicted independently of any bio, but that's not the case for programming them. Yes. > That still leaves ufshcd_hold(), which is still needed to ungate the UFS clocks. > It does accept an 'async' argument, which is used by ufshcd_queuecommand() to > schedule work to ungate the clocks and return SCSI_MLQUEUE_HOST_BUSY. > > So in blk_mq_dispatch_rq_list(), we could potentially try to acquire the > keyslot, and if it can't be done because either none are available or because > something else needs to be waited for, we can put the request back on the > dispatch list -- similar to how failure to get a driver tag is handled. Yes, that is what I had in mind. > However, if I understand correctly, that would mean that all requests to the > same hardware queue would be blocked whenever someone is waiting for a keyslot > -- even unencrypted requests and requests for unrelated keyslots. At least for an initial dumb implementation, yes. But if we care enough we can improve the code to check for the encrypted flag and only put back encrypted flags in that case. > It's possible that would still be fine for the Android use case, as vendors tend > to add enough keyslots to work with Android, provided that they choose the > fscrypt format that uses one key per encryption policy rather than one key per > file. I.e., it might be the case that no one waits for keyslots in practice > anyway. But, it seems it would be undesirable for a general Linux kernel > framework, which could potentially be used with per-file keys or with hardware > that only has a *very* small number of keyslots. > > Another option would be to allocate the keyslot in blk_mq_get_request(), where > sleeping is still allowed, but some merging was already done. That is another good idea. In blk_mq_get_request we acquire other resources like the tag, so this would be a very logical places to acquire the key slots. We can should also be able to still merge into the request while it is waiting.
Hi Christoph, I sent out v7 of the patch series. It's at https://lore.kernel.org/linux-fscrypt/20200221115050.238976-1-satyat@google.com/T/#t It manages keyslots on a per-request basis now - I made it get keyslots in blk_mq_get_request, because that way I wouldn't have to worry about programming keys in an atomic context. What do you think of the new approach? On Wed, Feb 05, 2020 at 10:05:41AM -0800, Christoph Hellwig wrote: > On Tue, Feb 04, 2020 at 11:36:01PM -0800, Eric Biggers wrote: > > The vendor-specific SMC calls do seem to work in atomic context, at least on > > SDA845. However, in ufshcd_program_key(), the calls to pm_runtime_get_sync() > > and ufshcd_hold() can also sleep. > > > > I think we can move the pm_runtime_get_sync() to ufshcd_crypto_keyslot_evict(), > > since the block layer already ensures the device is not runtime-suspended while > > requests are being processed (see blk_queue_enter()). I.e., keyslots can be > > evicted independently of any bio, but that's not the case for programming them. > > Yes. > > > That still leaves ufshcd_hold(), which is still needed to ungate the UFS clocks. > > It does accept an 'async' argument, which is used by ufshcd_queuecommand() to > > schedule work to ungate the clocks and return SCSI_MLQUEUE_HOST_BUSY. > > > > So in blk_mq_dispatch_rq_list(), we could potentially try to acquire the > > keyslot, and if it can't be done because either none are available or because > > something else needs to be waited for, we can put the request back on the > > dispatch list -- similar to how failure to get a driver tag is handled. > > Yes, that is what I had in mind. > > > However, if I understand correctly, that would mean that all requests to the > > same hardware queue would be blocked whenever someone is waiting for a keyslot > > -- even unencrypted requests and requests for unrelated keyslots. > > At least for an initial dumb implementation, yes. But if we care enough > we can improve the code to check for the encrypted flag and only put > back encrypted flags in that case. > > > It's possible that would still be fine for the Android use case, as vendors tend > > to add enough keyslots to work with Android, provided that they choose the > > fscrypt format that uses one key per encryption policy rather than one key per > > file. I.e., it might be the case that no one waits for keyslots in practice > > anyway. But, it seems it would be undesirable for a general Linux kernel > > framework, which could potentially be used with per-file keys or with hardware > > that only has a *very* small number of keyslots. > > > > Another option would be to allocate the keyslot in blk_mq_get_request(), where > > sleeping is still allowed, but some merging was already done. > > That is another good idea. In blk_mq_get_request we acquire other > resources like the tag, so this would be a very logical places to > acquire the key slots. We can should also be able to still merge into > the request while it is waiting.
On Fri, Feb 21, 2020 at 04:30:30AM -0800, Satya Tangirala wrote: > Hi Christoph, > > I sent out v7 of the patch series. It's at > https://lore.kernel.org/linux-fscrypt/20200221115050.238976-1-satyat@google.com/T/#t > > It manages keyslots on a per-request basis now - I made it get keyslots > in blk_mq_get_request, because that way I wouldn't have to worry about > programming keys in an atomic context. What do you think of the new > approach? I'll try to review it soon, thanks!