diff mbox series

[3/8] block: blk-crypto for Inline Encryption

Message ID 20190710225609.192252-4-satyat@google.com (mailing list archive)
State New, archived
Headers show
Series Inline Encryption Support | expand

Commit Message

Satya Tangirala July 10, 2019, 10:56 p.m. UTC
We introduce blk-crypto, which manages programming keyslots for struct
bios. With blk-crypto, filesystems only need to call bio_crypt_set_ctx with
the encryption key, algorithm and data_unit_num; they don't have to worry
about getting a keyslot for each encryption context, as blk-crypto handles
that. Blk-crypto also makes it possible for layered devices like device
mapper to make use of inline encryption hardware.

Blk-crypto delegates crypto operations to inline encryption hardware when
available, and also contains a software fallback to the kernel crypto API.
For more details, refer to Documentation/block/blk-crypto.txt.

Known issues:
1) We're allocating crypto_skcipher in blk_crypto_keyslot_program, which
uses GFP_KERNEL to allocate memory, but this function is on the write
path for IO - we need to add support for specifying a different flags
to the crypto API.

Signed-off-by: Satya Tangirala <satyat@google.com>
---
 Documentation/block/inline-encryption.txt | 185 +++++++
 block/Makefile                            |   3 +-
 block/bio-crypt-ctx.c                     |   7 +-
 block/bio.c                               |   5 +
 block/blk-core.c                          |  11 +-
 block/blk-crypto.c                        | 585 ++++++++++++++++++++++
 include/linux/bio.h                       |   7 +
 include/linux/blk-crypto.h                |  40 ++
 8 files changed, 840 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/block/inline-encryption.txt
 create mode 100644 block/blk-crypto.c
 create mode 100644 include/linux/blk-crypto.h

Comments

Randy Dunlap July 11, 2019, 5:47 a.m. UTC | #1
Hi,

Documentation nits, typos. questions...

On 7/10/19 3:56 PM, Satya Tangirala wrote:
> diff --git a/Documentation/block/inline-encryption.txt b/Documentation/block/inline-encryption.txt
> new file mode 100644
> index 000000000000..96a7983a117d
> --- /dev/null
> +++ b/Documentation/block/inline-encryption.txt
> @@ -0,0 +1,185 @@
> +BLK-CRYPTO and KEYSLOT MANAGER
> +===========================
> +
> +CONTENTS
> +1. Objective
> +2. Constraints and notes
> +3. Design
> +4. Blk-crypto
> + 4-1 What does blk-crypto do on bio submission
> +5. Layered Devices
> +6. Future optimizations for layered devices
> +
> +1. Objective
> +============
> +
> +We want to support inline encryption (IE) in the kernel.
> +To allow for testing, we also want a software fallback when actual
> +IE hardware is absent. We also want IE to work with layered devices
> +like dm and loopback (i.e. we want to be able to use the IE hardware
> +of the underlying devices if present, or else fall back to software
> +en/decryption).
> +
> +
> +2. Constraints and notes
> +========================
> +
> +1) IE hardware have a limited number of “keyslots” that can be programmed
> +with an encryption context (key, algorithm, data unit size, etc.) at any time.
> +One can specify a keyslot in a data request made to the device, and the
> +device will en/decrypt the data using the encryption context programmed into
> +that specified keyslot. When possible, we want to make multiple requests with
> +the same encryption context share the same keyslot.
> +
> +2) We need a way for filesystems to specify an encryption context to use for
> +en/decrypting a struct bio, and a device driver (like UFS) needs to be able
> +to use that encryption context when it processes the bio.
> +
> +3) We need a way for device drivers to expose their capabilities in a unified
> +way to the upper layers.
> +
> +
> +3. Design
> +=========
> +
> +We add a struct bio_crypt_context to struct bio that can represent an

         is this   bi_crypt_context ??

> +encryption context, because we need to able to pass this encryption context

                                       to be able

> +from the FS layer to the device driver to act upon.
> +
> +While IE hardware works on the notion of keyslots, the FS layer has no
> +knowledge of keyslots - it simply wants to specify an encryption context to
> +use while en/decrypting a bio.
> +
> +We introduce a keyslot manager (KSM) that handles the translation from
> +encryption contexts specified by the FS to keyslots on the IE hardware.
> +This KSM also serves as the way IE hardware can expose their capabilities to
> +upper layers. The generic mode of operation is: each device driver that wants
> +to support IE will construct a KSM and set it up in its struct request_queue.
> +Upper layers that want to use IE on this device can then use this KSM in
> +the device’s struct request_queue to translate an encryption context into
> +a keyslot. The presence of the KSM in the request queue shall be used to mean
> +that the device supports IE.
> +
> +On the device driver end of the interface, the device driver needs to tell the
> +KSM how to actually manipulate the IE hardware in the device to do things like
> +programming the crypto key into the IE hardware into a particular keyslot. All
> +this is achieved through the struct keyslot_mgmt_ll_ops that the device driver
> +passes to the KSM when creating it.
> +
> +It uses refcounts to track which keyslots are idle (either they have no
> +encryption context programmed, or there are no in flight struct bios

                                                  in-flight

> +referencing that keyslot). When a new encryption context needs a keyslot, it
> +tries to find a keyslot that has already been programmed with the same
> +encryption context, and if there is no such keyslot, it evicts the least
> +recently used idle keyslot and programs the new encryption context into that
> +one. If no idle keyslots are available, then the caller will sleep until there
> +is at least one.
> +
> +
> +4. Blk-crypto
> +=============
> +
> +The above is sufficient for simple cases, but does not work if there is a
> +need for a software fallback, or if we are want to use IE with layered devices.
> +To these ends, we introduce blk-crypto. Blk-crypto allows us to present a
> +unified view of encryption to the FS (so FS only needs to specify an
> +encryption context and not worry about keyslots at all), and blk-crypto can
> +decide whether to delegate the en/decryption to IE hardware or to software
> +(i.e. to the kernel crypto API). Blk-crypto maintains an internal KSM that
> +serves as the software fallback to the kernel crypto API.
> +
> +Blk-crypto needs to ensure that the encryption context is programmed into the
> +"correct" keyslot manager for IE. If a bio is submitted to a layered device
> +that eventually passes the bio down to a device that really does support IE, we
> +want the encryption context to be programmed into a keyslot for the KSM of the
> +device with IE support. However, blk-crypto does not know a priori whether a
> +particular device is the final device in the layering structure for a bio or
> +not. So in the case that a particular device does not support IE, since it is
> +possibly the final destination device for the bio, if the bio requires
> +encryption (i.e. the bio is doing a write operation), blk-crypto must fallback
> +to software *before* sending the bio to the device.
> +
> +Blk-crypto ensures that
> +1) The bio’s encryption context is programmed into a keyslot in the KSM of the
> +request queue that the bio is being submitted to (or the software fallback KSM
> +if the request queue doesn’t have a KSM), and that the processing_ksm in the
> +bi_crypt_context is set to this KSM
> +
> +2) That the bio has its own individual reference to the keyslot in this KSM.
> +Once the bio passes through blk-crypto, its encryption context is programmed
> +in some KSM. The “its own individual reference to the keyslot” ensures that
> +keyslots can be released by each bio independently of other bios while ensuring
> +that the bio has a valid reference to the keyslot when, for e.g., the software
> +fallback KSM in blk-crypto performs crypto for on the device’s behalf. The
> +individual references are ensured by increasing the refcount for the keyslot in
> +the processing_ksm when a bio with a programmed encryption context is cloned.
> +
> +
> +4-1. What blk-crypto does on bio submission
> +-------------------------------------------
> +
> +Case 1: blk-crypto is given a bio with only an encryption context that hasn’t
> +been programmed into any keyslot in any KSM (for e.g. a bio from the FS). In
> +this case, blk-crypto will program the encryption context into the KSM of the
> +request queue the bio is being submitted to (and if this KSM does not exist,
> +then it will program it into blk-crypto’s internal KSM for software fallback).
> +The KSM that this encryption context was programmed into is stored as the
> +processing_ksm in the bio’s bi_crypt_context.
> +
> +Case 2: blk-crypto is given a bio whose encryption context has already been
> +programmed into a keyslot in the *software fallback KSM*. In this case,
> +blk-crypto does nothing; it treats the bio as not having specified an
> +encryption context. Note that we cannot do what we will do in Case 3 here

                       Note that we cannot do here what we will do in Case 3

> +because we would have already encrypted the bio in software by this point.
> +
> +Case 3: blk-crypto is given a bio whose encryption context has already been
> +programmed into a keyslot in some KSM (that is *not* the software fallback
> +KSM). In this case, blk-crypto first releases that keyslot from that KSM and
> +then treats the bio as in Case 1.
> +
> +This way, when a device driver is processing a bio, it can be sure that
> +the bio’s encryption context has been programmed into some KSM (either the
> +device driver’s request queue’s KSM, or blk-crypto’s software fallback KSM).
> +It then simply needs to check if the bio’s processing_ksm is the device’s
> +request queue’s KSM. If so, then it should proceed with IE. If not, it should
> +simply do nothing with respect to crypto, because some other KSM (perhaps the
> +blk-crypto software fallback KSM) is handling the en/decryption.
> +
> +Blk-crypto will release the keyslot that is being held by the bio (and also
> +decrypt it if the bio is using the software fallback KSM) once
> +bio_remaining_done returns true for the bio.
> +
> +
> +5. Layered Devices
> +==================
> +
> +Layered devices that wish to support IE need to create their own keyslot
> +manager for their request queue, and expose whatever functionality they choose.
> +When a layered device wants to pass a bio to another layer (either by
> +resubmitting the same bio, or by submitting a clone), it doesn’t need to do
> +anything special because the bio (or the clone) will once again pass through
> +blk-crypto, which will work as described in Case 3. If a layered device wants
> +for some reason to do the IO by itself instead of passing it on to a child
> +device, but it also chose to expose IE capabilities by setting up a KSM in its
> +request queue, it is then responsible for en/decrypting the data itself. In
> +such cases, the device can choose to call the blk-crypto function
> +blk_crypto_fallback_to_software (TODO: Not yet implemented), which will
> +cause the en/decryption to be done via software fallback.
> +
> +
> +6. Future Optimizations for layered devices
> +===========================================
> +
> +Creating a keyslot manager for the layered device uses up memory for each
> +keyslot, and in general, a layered device (like dm-linear) merely passes the
> +request on to a “child” device, so the keyslots in the layered device itself
> +might be completely unused. We can instead define a new type of KSM; the
> +“passthrough KSM”, that layered devices can use to let blk-crypto know that
> +this layered device *will* pass the bio to some child device (and hence
> +through blk-crypto again, at which point blk-crypto can program the encryption
> +context, instead of programming it into the layered device’s KSM). Again, if
> +the device “lies” and decides to do the IO itself instead of passing it on to
> +a child device, it is responsible for doing the en/decryption (and can choose
> +to call blk_crypto_fallback_to_software). Another use case for the
> +"passthrough KSM" is for IE devices that want to manage their own keyslots/do
> +not have a limited number of keyslots.
Eric Biggers July 15, 2019, 3:40 p.m. UTC | #2
Hi Satya, here's yet another round of comments.  I'd really like for
people who are experts in the block layer to review this too, though.
The integration into the block layer needs more review.

On Wed, Jul 10, 2019 at 03:56:04PM -0700, Satya Tangirala wrote:
> We introduce blk-crypto, which manages programming keyslots for struct
> bios. With blk-crypto, filesystems only need to call bio_crypt_set_ctx with
> the encryption key, algorithm and data_unit_num; they don't have to worry
> about getting a keyslot for each encryption context, as blk-crypto handles
> that. Blk-crypto also makes it possible for layered devices like device
> mapper to make use of inline encryption hardware.
> 
> Blk-crypto delegates crypto operations to inline encryption hardware when
> available, and also contains a software fallback to the kernel crypto API.
> For more details, refer to Documentation/block/blk-crypto.txt.

It's not necessarily a "software fallback", since the kernel crypto API
supports hardware crypto accelerators, just not *inline* hardware crypto
accelerators.  To avoid this inevitable confusion, can you please call
it the "crypto API fallback"?  This applies to the whole patch series,
including both the code and documentation.

> 
> Known issues:
> 1) We're allocating crypto_skcipher in blk_crypto_keyslot_program, which
> uses GFP_KERNEL to allocate memory, but this function is on the write
> path for IO - we need to add support for specifying a different flags
> to the crypto API.

You could use memalloc_noio_save() and memalloc_noio_restore(), rather
than gfp_flags.

> diff --git a/block/blk-crypto.c b/block/blk-crypto.c
> new file mode 100644
> index 000000000000..f41fb7819ae9
> --- /dev/null
> +++ b/block/blk-crypto.c
> @@ -0,0 +1,585 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2019 Google LLC
> + */

Perhaps include a reference to the documentation file?

Also, how about adding:

	#define pr_fmt(fmt) "blk-crypto: " fmt

... so that all log messages in this file automatically get an
appropriate prefix.

> +#include <linux/blk-crypto.h>
> +#include <linux/keyslot-manager.h>
> +#include <linux/mempool.h>
> +#include <linux/blk-cgroup.h>
> +#include <crypto/skcipher.h>
> +#include <crypto/algapi.h>
> +#include <linux/module.h>

This code is using the crypto API, but as-is
CONFIG_BLK_INLINE_ENCRYPTION can be enabled without also enabling the
crypto API, which will cause a build error.  The Kconfig option needs:

        select CRYPTO
        select CRYPTO_BLKCIPHER

> +
> +struct blk_crypt_mode {
> +	const char *cipher_str;
> +	size_t keysize;
> +};

In general, I think more comments in this file would be really helpful.

For example, can you document the fields of blk_crypt_mode?  E.g.

struct blk_crypt_mode {
        const char *cipher_str; /* crypto API name (for fallback case) */
        size_t keysize;         /* key size in bytes */
};

> +
> +static const struct blk_crypt_mode blk_crypt_modes[] = {
> +	[BLK_ENCRYPTION_MODE_AES_256_XTS] = {
> +		.cipher_str = "xts(aes)",
> +		.keysize = 64,
> +	},
> +};
> +
> +#define BLK_CRYPTO_MAX_KEY_SIZE 64
> +/* TODO: Do we want to make this user configurable somehow? */
> +static int blk_crypto_num_keyslots = 100;

This TODO is stale, since this patch already makes num_keyslots a kernel
command line parameter.

> +
> +static struct blk_crypto_keyslot {
> +	struct crypto_skcipher *tfm;
> +	enum blk_crypt_mode_num crypt_mode;
> +	u8 key[BLK_CRYPTO_MAX_KEY_SIZE];
> +} *blk_crypto_keyslots;
> +
> +struct work_mem {
> +	struct work_struct crypto_work;
> +	struct bio *bio;
> +};
> +
> +static struct keyslot_manager *blk_crypto_ksm;
> +static struct workqueue_struct *blk_crypto_wq;
> +static mempool_t *blk_crypto_page_pool;
> +static struct kmem_cache *blk_crypto_work_mem_cache;

It would be helpful to add a comment somewhere that makes it clear that
all this stuff is for the crypto API fallback, which is not used if the
disk natively supports inline encryption.

I'm concerned that people will read this code and draw the wrong
conclusion about where the encryption is actually being done in their
particular case.

> +
> +/* TODO: handle modes that need essiv */

There's a pending patchset that makes ESSIV accessible via
crypto_skcipher.  So ESSIV won't need special handling.  I suggest just
deleting this TODO comment for now.

> +static int blk_crypto_keyslot_program(void *priv, const u8 *key,
> +				      enum blk_crypt_mode_num crypt_mode,
> +				      unsigned int data_unit_size,
> +				      unsigned int slot)
> +{
> +	struct blk_crypto_keyslot *slotp = &blk_crypto_keyslots[slot];
> +	struct crypto_skcipher *tfm = slotp->tfm;
> +	const struct blk_crypt_mode *mode = &blk_crypt_modes[crypt_mode];
> +	size_t keysize = mode->keysize;
> +	int err;
> +
> +	if (crypt_mode != slotp->crypt_mode || !tfm) {

Nit: reversing this to '!tfm || crypt_mode != slotp->crypt_mode' would
be more logical because as the code is written, unused slots have tfm ==
NULL but an undefined crypt_mode.  It doesn't make sense to check the
undefined thing first, then the defined thing.

> +		evict_keyslot(slot);
> +		tfm = crypto_alloc_skcipher(
> +			mode->cipher_str, 0, 0);

Nit: the parameters to crypto_alloc_skcipher() fit in one line.

> +		if (IS_ERR(tfm))
> +			return PTR_ERR(tfm);
> +
> +		crypto_skcipher_set_flags(tfm,
> +					  CRYPTO_TFM_REQ_FORBID_WEAK_KEYS);

Nit: the parameters to crypto_skcipher_set_flags() fit in one line.

> +static int blk_crypto_keyslot_find(void *priv,
> +				   const u8 *key,
> +				   enum blk_crypt_mode_num crypt_mode,
> +				   unsigned int data_unit_size_bytes)
> +{
> +	int slot;
> +	const size_t keysize = blk_crypt_modes[crypt_mode].keysize;
> +
> +	/* TODO: hashmap? */
> +	for (slot = 0; slot < blk_crypto_num_keyslots; slot++) {
> +		if (blk_crypto_keyslots[slot].crypt_mode == crypt_mode &&
> +		    !crypto_memneq(blk_crypto_keyslots[slot].key, key, keysize))
> +			return slot;
> +	}

There should be no TODO comments in the code, so please do something
about this one.  Note that we musn't leak other keys via timing
information, so a naive hashmap is not an appropriate solution.  So
perhaps this comment should just be deleted, or updated to explain the
timing attack problem and how it's solved.

> +static bool blk_crypt_mode_supported(void *priv,
> +				     enum blk_crypt_mode_num crypt_mode,
> +				     unsigned int data_unit_size)
> +{
> +	/* All blk_crypt_modes are required to have a software fallback. */
> +	return true;
> +}
> +
> +static const struct keyslot_mgmt_ll_ops blk_crypto_ksm_ll_ops = {
> +	.keyslot_program	= blk_crypto_keyslot_program,
> +	.keyslot_evict		= blk_crypto_keyslot_evict,
> +	.keyslot_find		= blk_crypto_keyslot_find,
> +	.crypt_mode_supported	= blk_crypt_mode_supported,
> +};

There's a missing "o" in blk_crypt_mode_supported().

> +
> +static void blk_crypto_encrypt_endio(struct bio *enc_bio)
> +{
> +	struct bio *src_bio = enc_bio->bi_private;
> +	struct bio_vec *enc_bvec, *enc_bvec_end;
> +
> +	enc_bvec = enc_bio->bi_io_vec;
> +	enc_bvec_end = enc_bvec + enc_bio->bi_vcnt;
> +	for (; enc_bvec != enc_bvec_end; enc_bvec++)
> +		mempool_free(enc_bvec->bv_page, blk_crypto_page_pool);

Nit: this could just be a regular loop over 'i':

	for (i = 0; i < enc_bio->bi_vcnt; i++)
		mempool_free(enc_bio->bi_io_vec[i].bv_page,
			     blk_crypto_page_pool);

It's one line shorter and slightly easier to read.

> +static struct bio *blk_crypto_clone_bio(struct bio *bio_src)
> +{
> +	struct bvec_iter iter;
> +	struct bio_vec bv;
> +	struct bio *bio;
> +
> +	bio = bio_alloc_bioset(GFP_NOIO, bio_segments(bio_src), NULL);
> +	if (!bio)
> +		return NULL;
> +	bio->bi_disk		= bio_src->bi_disk;
> +	bio->bi_opf		= bio_src->bi_opf;
> +	bio->bi_ioprio		= bio_src->bi_ioprio;
> +	bio->bi_write_hint	= bio_src->bi_write_hint;
> +	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
> +	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
> +
> +	bio_for_each_segment(bv, bio_src, iter)
> +		bio->bi_io_vec[bio->bi_vcnt++] = bv;
> +
> +	if (bio_integrity(bio_src)) {
> +		int ret;
> +
> +		ret = bio_integrity_clone(bio, bio_src, GFP_NOIO);
> +		if (ret < 0) {
> +			bio_put(bio);
> +			return NULL;
> +		}
> +	}

Nit: the code under bio_integrity() can be written more concisely:

	if (bio_integrity(bio_src) &&
	    bio_integrity_clone(bio, bio_src, GFP_NOIO) < 0) {
		bio_put(bio);
		return NULL;
	}

> +static int blk_crypto_encrypt_bio(struct bio **bio_ptr)
> +{
> +	struct bio *src_bio = *bio_ptr;
> +	int slot;
> +	struct skcipher_request *ciph_req = NULL;
> +	DECLARE_CRYPTO_WAIT(wait);
> +	struct bio_vec bv;
> +	struct bvec_iter iter;
> +	int err = 0;
> +	u64 curr_dun;
> +	union {
> +		__le64 dun;
> +		u8 bytes[16];
> +	} iv;
> +	struct scatterlist src, dst;
> +	struct bio *enc_bio;
> +	struct bio_vec *enc_bvec;
> +	int i, j;
> +	unsigned int num_sectors;
> +	int data_unit_size;

As I mentioned on v2, this function is difficult to understand because
it's so long.  I gave a suggestion to make it slightly shorter, which
was not taken.  It would be helpful if you responded to suggestions that
you're not accepting, since otherwise it's unclear whether the
suggestion was rejected, or was missed or forgotten.

Anyway, a new suggestion: it also seems the bio splitting can be nicely
made its own function, which would help.  E.g.:

static int blk_crypto_split_bio_if_needed(struct bio **bio_ptr)
{
       struct bio *bio = *bio_ptr;
       unsigned int i = 0;
       unsigned int num_sectors = 0;
       struct bio_vec bv;
       struct bvec_iter iter;

       bio_for_each_segment(bv, bio, iter) {
               num_sectors += bv.bv_len >> SECTOR_SHIFT;
               if (++i == BIO_MAX_PAGES)
                       break;
       }
       if (num_sectors < bio_sectors(bio)) {
               struct bio *split_bio;

               split_bio = bio_split(bio, num_sectors, GFP_NOIO, NULL);
               if (!split_bio) {
                       bio->bi_status = BLK_STS_RESOURCE;
                       return -ENOMEM;
               }
               bio_chain(split_bio, bio);
               generic_make_request(bio);
               *bio_ptr = split_bio;
       }
       return 0;
}

> +
> +	/* Split the bio if it's too big for single page bvec */
> +	i = 0;
> +	num_sectors = 0;
> +	data_unit_size = 1 << src_bio->bi_crypt_context->data_unit_size_bits;
> +	bio_for_each_segment(bv, src_bio, iter) {
> +		num_sectors += bv.bv_len >> 9;
> +		if (bv.bv_len % data_unit_size != 0) {
> +			src_bio->bi_status = BLK_STS_IOERR;
> +			return -EIO;
> +		}
> +		if (++i == BIO_MAX_PAGES)
> +			break;
> +	}

1.) The alignment check should be done on both bv_len and bv_offset.

2.) To make the behavior consistent, I also think it should be done
    regardless of whether the crypto API fallback is being used.

3.) As coded, this alignment check is broken because it's only being
    done on the first BIO_MAX_PAGES pages of the bio.

4.) '%' or '/' by a non-constant generally should be avoided in kernel
    code, because they can be very slow, and they aren't supported on
    64-bit variables on some architectures.  data_unit_size is always a
    power of 2, so use IS_ALIGNED() instead.

Instead, I suggest adding the following and calling it from early in
blk_crypto_submit_bio():

/* All I/O segments must be data unit aligned */
static int bio_crypt_check_alignment(struct bio *bio)
{
        int data_unit_size = 1 << bio->bi_crypt_context->data_unit_size_bits;
        struct bvec_iter iter;
        struct bio_vec bv;

        bio_for_each_segment(bv, bio, iter) {
                if (!IS_ALIGNED(bv.bv_len | bv.bv_offset, data_unit_size))
                        return -EIO;
        }
        return 0;
}

> +	if (num_sectors < bio_sectors(src_bio)) {
> +		struct bio *split_bio;
> +
> +		split_bio = bio_split(src_bio, num_sectors, GFP_NOIO, NULL);
> +		if (!split_bio) {
> +			src_bio->bi_status = BLK_STS_RESOURCE;
> +			return -ENOMEM;
> +		}
> +		bio_chain(split_bio, src_bio);
> +		generic_make_request(src_bio);
> +		*bio_ptr = split_bio;
> +		src_bio = *bio_ptr;
> +	}
> +
> +	enc_bio = blk_crypto_clone_bio(src_bio);
> +	if (!enc_bio) {
> +		src_bio->bi_status = BLK_STS_RESOURCE;
> +		return -ENOMEM;
> +	}
> +
> +	err = bio_crypt_ctx_acquire_keyslot(src_bio, blk_crypto_ksm);
> +	if (err) {
> +		src_bio->bi_status = BLK_STS_IOERR;
> +		bio_put(enc_bio);
> +		return err;
> +	}
> +	slot = bio_crypt_get_keyslot(src_bio);
> +
> +	ciph_req = skcipher_request_alloc(blk_crypto_keyslots[slot].tfm,
> +					  GFP_NOIO);
> +	if (!ciph_req) {
> +		src_bio->bi_status = BLK_STS_RESOURCE;
> +		err = -ENOMEM;
> +		bio_put(enc_bio);
> +		goto out_release_keyslot;
> +	}
> +
> +	skcipher_request_set_callback(ciph_req,
> +				      CRYPTO_TFM_REQ_MAY_BACKLOG |
> +				      CRYPTO_TFM_REQ_MAY_SLEEP,
> +				      crypto_req_done, &wait);

To help people follow this code, a few brief comments describing the
high-level tasks being accomplished here would be helpful.  E.g.:

       /*
        * Use the crypto API fallback keyslot manager to get a crypto_skcipher
        * for the algorithm and key, then allocate an skcipher_request for it.
        */

Then below:

	/* Encrypt each page in the bio */

and

		/* Encrypt each data unit in the page */


> +
> +	curr_dun = bio_crypt_data_unit_num(src_bio);
> +	sg_init_table(&src, 1);
> +	sg_init_table(&dst, 1);
> +	for (i = 0, enc_bvec = enc_bio->bi_io_vec; i < enc_bio->bi_vcnt;
> +	     enc_bvec++, i++) {
> +		struct page *page = enc_bvec->bv_page;

Nit: to make this slightly easier to read, you could just use a regular
loop over 'i' and assign enc_bvec in the loop:

	for (i = 0; i < enc_bio->bi_vcnt; i++) {
		struct bio_vec *enc_bvec = &enc_bio->bi_io_vec[i];


> +		struct page *ciphertext_page =
> +			mempool_alloc(blk_crypto_page_pool, GFP_NOIO);

Please rename 'page' => 'plaintext_page' to make it very clear which is
the plaintext and which is the ciphertext.

> +
> +		enc_bvec->bv_page = ciphertext_page;
> +
> +		if (!ciphertext_page)
> +			goto no_mem_for_ciph_page;
> +
> +		for (j = 0; j < enc_bvec->bv_len / data_unit_size; j++) {
> +			memset(&iv, 0, sizeof(iv));
> +			iv.dun = cpu_to_le64(curr_dun);
> +
> +			sg_set_page(&src, page, data_unit_size,
> +				    enc_bvec->bv_offset + j * data_unit_size);
> +			sg_set_page(&dst, ciphertext_page, data_unit_size,
> +				    enc_bvec->bv_offset + j * data_unit_size);
> +
> +			skcipher_request_set_crypt(ciph_req, &src, &dst,
> +						   data_unit_size, iv.bytes);
> +			err = crypto_wait_req(crypto_skcipher_encrypt(ciph_req),
> +					      &wait);
> +			if (err)
> +				goto no_mem_for_ciph_page;
> +			curr_dun++;
> +		}

This could be optimized slightly by only calling
skcipher_request_set_crypt() once per bio (since the scatterlists and IV
buffer are always the same), and by only doing the two sg_set_page()
calls once per page and instead updating the scatterlists after
encrypting each data unit:

	src.offset += data_unit_size;
	dst.offset += data_unit_size;

And the loop can be:

	for (j = 0; j < enc_bvec->bv_len; j += data_unit_size) {

... so that no divisions or multiplications are needed.

> +		continue;
> +no_mem_for_ciph_page:
> +		err = -ENOMEM;
> +		for (j = i - 1; j >= 0; j--) {
> +			mempool_free(enc_bio->bi_io_vec[j].bv_page,
> +				     blk_crypto_page_pool);
> +		}
> +		src_bio->bi_status = BLK_STS_RESOURCE;
> +		bio_put(enc_bio);
> +		goto out_release_cipher;
> +	}

If crypto_skcipher_encrypt() fails, a bounce page is leaked.  Also, the
failure is not necessarily caused by "no memory".

To fix the former, we can increment 'i' immediately after the page has
been allocated.  To fix the latter, we can set bi_status at the place
the error happened.

To make the error handling easier to read and less error prone, how
about it also moving it to the end of the function, including the
bio_put(enc_bio):

        enc_bio = NULL;
        err = 0;
        goto out_free_ciph_req;

out_free_bounce_pages:
        while (i > 0)
                mempool_free(enc_bio->bi_io_vec[--i].bv_page,
                             blk_crypto_page_pool);
out_free_ciph_req:
        skcipher_request_free(ciph_req);
out_release_keyslot:
        bio_crypt_ctx_release_keyslot(src_bio);
out_put_enc_bio:
        if (enc_bio)
                bio_put(enc_bio);
        return err;

> +
> +	enc_bio->bi_private = src_bio;
> +	enc_bio->bi_end_io = blk_crypto_encrypt_endio;
> +
> +	*bio_ptr = enc_bio;
> +out_release_cipher:
> +	skcipher_request_free(ciph_req);
> +out_release_keyslot:
> +	bio_crypt_ctx_release_keyslot(src_bio);
> +	return err;
> +}
> +
> +/*
> + * TODO: assumption right now is:
> + * each segment in bio has length divisible by the data_unit_size
> + */

Why is this a TODO?  What's wrong with just failing bio's that aren't
properly aligned?

> +static void blk_crypto_decrypt_bio(struct work_struct *w)
> +{
> +	struct work_mem *work_mem =
> +		container_of(w, struct work_mem, crypto_work);
> +	struct bio *bio = work_mem->bio;
> +	int slot;
> +	struct skcipher_request *ciph_req;
> +	DECLARE_CRYPTO_WAIT(wait);
> +	struct bio_vec bv;
> +	struct bvec_iter iter;
> +	u64 curr_dun;
> +	union {
> +		__le64 dun;
> +		u8 bytes[16];
> +	} iv;
> +	struct scatterlist sg;
> +	int err;
> +	int data_unit_size = 1 << bio->bi_crypt_context->data_unit_size_bits;
> +	int i;
> +
> +	kmem_cache_free(blk_crypto_work_mem_cache, work_mem);

Nit: freeing the work_mem is cleanup work that always must be done,
similar to calling bio_endio().  So I suggest moving this to the end of
this function, after bio_endio().  Otherwise it would be easy for a
future patch to accidentally introduce a use-after-free of work_mem.

> +
> +	err = bio_crypt_ctx_acquire_keyslot(bio, blk_crypto_ksm);
> +	if (err) {
> +		bio->bi_status = BLK_STS_RESOURCE;
> +		goto out_no_keyslot;
> +	}
> +
> +	slot = bio_crypt_get_keyslot(bio);
> +	ciph_req = skcipher_request_alloc(blk_crypto_keyslots[slot].tfm,
> +					  GFP_NOIO);
> +	if (!ciph_req) {
> +		bio->bi_status = BLK_STS_RESOURCE;
> +		goto out;
> +	}
> +
> +	skcipher_request_set_callback(ciph_req,
> +				      CRYPTO_TFM_REQ_MAY_BACKLOG |
> +				      CRYPTO_TFM_REQ_MAY_SLEEP,
> +				      crypto_req_done, &wait);
> +
> +	curr_dun = bio_crypt_sw_data_unit_num(bio);
> +	sg_init_table(&sg, 1);
> +	__bio_for_each_segment(bv, bio, iter,
> +			       bio->bi_crypt_context->crypt_iter) {
> +		if (bv.bv_len % data_unit_size != 0) {
> +			bio->bi_status = BLK_STS_IOERR;
> +			err = -EIO;
> +			goto out;
> +		}
> +		for (i = 0; i < bv.bv_len / data_unit_size; i++) {
> +			struct page *page = bv.bv_page;
> +
> +			memset(&iv, 0, sizeof(iv));
> +			iv.dun = cpu_to_le64(curr_dun);
> +
> +			sg_set_page(&sg, page, data_unit_size,
> +				    bv.bv_offset + i * data_unit_size);
> +			skcipher_request_set_crypt(ciph_req, &sg, &sg,
> +						   data_unit_size, iv.bytes);
> +			err = crypto_wait_req(crypto_skcipher_decrypt(ciph_req),
> +					      &wait);
> +			if (err) {
> +				bio->bi_status = BLK_STS_IOERR;
> +				goto out;
> +			}
> +			curr_dun++;
> +		}
> +	}

Most of the comments I made for the encryption case apply here too.

> +
> +out:
> +	skcipher_request_free(ciph_req);
> +	bio_crypt_ctx_release_keyslot(bio);
> +out_no_keyslot:
> +	bio_endio(bio);
> +}
> +
> +static void blk_crypto_queue_decrypt_bio(struct bio *bio)
> +{
> +	struct work_mem *work_mem =
> +		kmem_cache_zalloc(blk_crypto_work_mem_cache, GFP_ATOMIC);
> +
> +	if (!work_mem) {
> +		bio->bi_status = BLK_STS_RESOURCE;
> +		return bio_endio(bio);
> +	}

It's not conventional to chain 'void' return values like this.

Try to be as boring as possible.  Just write:

		bio_endio(bio);
		return;

> +
> +	INIT_WORK(&work_mem->crypto_work, blk_crypto_decrypt_bio);
> +	work_mem->bio = bio;
> +	queue_work(blk_crypto_wq, &work_mem->crypto_work);
> +}
> +
> +/**
> + * blk_crypto_submit_bio - handle submitting bio for inline encryption
> + *
> + * @bio_ptr: pointer to original bio pointer
> + *
> + * If the bio doesn't have inline encryption enabled or the submitter already
> + * specified a keyslot for the target device, do nothing.  Else, a raw key must
> + * have been provided, so acquire a device keyslot for it if supported.  Else,
> + * use the software crypto fallback.
> + *
> + * When the software fallback is used for encryption, blk-crypto may choose to
> + * split the bio into 2 - one that will continue to be processed and the other
> + * that will be resubmitted via generic_make_request. *bio_ptr will be updated
> + * to the first bio (the one that will continue to be processed).

The second paragraph is misleading because it doesn't explain that
*bio_ptr is actually updated to point to the bounce bio.

> + *
> + * Return: 0 if bio submission should continue; nonzero if bio_endio() was
> + *        already called so bio submission should abort.
> + */
> +int blk_crypto_submit_bio(struct bio **bio_ptr)
> +{
> +	struct bio *bio = *bio_ptr;
> +	struct request_queue *q;
> +	int err;
> +	struct bio_crypt_ctx *crypt_ctx;
> +
> +	if (!bio_is_encrypted(bio) || !bio_has_data(bio))
> +		return 0;
> +
> +	/*
> +	 * When a read bio is marked for sw decryption, its bi_iter is saved
> +	 * so that when we decrypt the bio later, we know what part of it was
> +	 * marked for sw decryption (when the bio is passed down after
> +	 * blk_crypto_submit bio, it may be split or advanced so we cannot rely
> +	 * on the bi_iter while decrypting in blk_crypto_endio)
> +	 */
> +	if (bio_crypt_swhandled(bio))
> +		return 0;
> +
> +	crypt_ctx = bio->bi_crypt_context;
> +	q = bio->bi_disk->queue;
> +
> +	if (bio_crypt_has_keyslot(bio)) {
> +		/* Key already programmed into device? */
> +		if (q->ksm == crypt_ctx->processing_ksm)
> +			return 0;
> +
> +		/* Nope, release the existing keyslot. */
> +		bio_crypt_ctx_release_keyslot(bio);
> +	}
> +
> +	/* Get device keyslot if supported */
> +	if (q->ksm) {
> +		err = bio_crypt_ctx_acquire_keyslot(bio, q->ksm);
> +		if (!err) {
> +			pr_warn_once("blk-crypto: failed to acquire keyslot for %s (err=%d).  Falling back to software crypto.\n",
> +				     bio->bi_disk->disk_name, err);
> +			return 0;
> +		}

This warning message should go in the err != 0 case, not err == 0.

> +	}
> +
> +	/* Fallback to software crypto */
> +	if (bio_data_dir(bio) == WRITE) {
> +		/* Encrypt the data now */
> +		err = blk_crypto_encrypt_bio(bio_ptr);
> +		if (err)
> +			goto out_encrypt_err;
> +	} else {
> +		/* Mark bio as swhandled */
> +		bio->bi_crypt_context->processing_ksm = blk_crypto_ksm;
> +		bio->bi_crypt_context->crypt_iter = bio->bi_iter;
> +		bio->bi_crypt_context->sw_data_unit_num =
> +				bio->bi_crypt_context->data_unit_num;
> +	}
> +	return 0;
> +out_encrypt_err:
> +	bio_endio(bio);
> +	return err;
> +}
> +
> +/**
> + * blk_crypto_endio - clean up bio w.r.t inline encryption during bio_endio
> + *
> + * @bio - the bio to clean up
> + *
> + * If blk_crypto_submit_bio decided to fallback to software crypto for this
> + * bio, we queue the bio for decryption into a workqueue and return false,
> + * and call bio_endio(bio) at a later time (after the bio has been decrypted).
> + *
> + * If the bio is not to be decrypted in software, this function releases the
> + * reference to the keyslot that blk_crypto_submit_bio got.
> + *
> + * Return: true if bio_endio should continue; false otherwise (bio_endio will
> + * be called again when bio has been decrypted).
> + */
> +bool blk_crypto_endio(struct bio *bio)
> +{
> +	if (bio_crypt_swhandled(bio)) {
> +		/*
> +		 * The only bios that are swhandled when they reach here
> +		 * are those with bio_data_dir(bio) == READ, since WRITE
> +		 * bios that are encrypted by software fallback are handled
> +		 * by blk_crypto_encrypt_endio.
> +		 */
> +		blk_crypto_queue_decrypt_bio(bio);
> +		return false;
> +	}

This should only do the decryption if there was no I/O error.

> +
> +	if (bio_is_encrypted(bio) && bio_crypt_has_keyslot(bio))
> +		bio_crypt_ctx_release_keyslot(bio);
> +
> +	return true;

Also it seems that blk_crypto_endio() should mirror
blk_crypto_submit_bio() and start with:

	if (!bio_is_encrypted(bio) || !bio_has_data(bio))
		return true;

Otherwise the bio_has_data() case might be handled inconsistently.  And
it's good to check bio_is_encrypted() first, so that the unencrypted
case is as fast and easy to understand as possible.

Also, blk_crypto_endio() always releases the keyslot, even if it was
provided by the submitter of the bio.  Shouldn't it only release the
keyslot if it was acquired by blk_crypto_submit_bio()?

> +}
> +
> +int __init blk_crypto_init(void)
> +{
> +	blk_crypto_ksm = keyslot_manager_create(blk_crypto_num_keyslots,
> +						&blk_crypto_ksm_ll_ops,
> +						NULL);
> +	if (!blk_crypto_ksm)
> +		goto out_ksm;
> +
> +	blk_crypto_wq = alloc_workqueue("blk_crypto_wq",
> +					WQ_UNBOUND | WQ_HIGHPRI |
> +					WQ_MEM_RECLAIM,
> +					num_online_cpus());
> +	if (!blk_crypto_wq)
> +		goto out_wq;
> +
> +	blk_crypto_keyslots = kcalloc(blk_crypto_num_keyslots,
> +				      sizeof(*blk_crypto_keyslots),
> +				      GFP_KERNEL);
> +	if (!blk_crypto_keyslots)
> +		goto out_blk_crypto_keyslots;
> +
> +	blk_crypto_page_pool =
> +		mempool_create_page_pool(num_prealloc_bounce_pg, 0);
> +	if (!blk_crypto_page_pool)
> +		goto out_bounce_pool;
> +
> +	blk_crypto_work_mem_cache = KMEM_CACHE(work_mem, SLAB_RECLAIM_ACCOUNT);
> +	if (!blk_crypto_work_mem_cache)
> +		goto out_work_mem_cache;
> +
> +	return 0;
> +
> +out_work_mem_cache:
> +	mempool_destroy(blk_crypto_page_pool);
> +	blk_crypto_page_pool = NULL;
> +out_bounce_pool:
> +	kzfree(blk_crypto_keyslots);
> +	blk_crypto_keyslots = NULL;
> +out_blk_crypto_keyslots:
> +	destroy_workqueue(blk_crypto_wq);
> +	blk_crypto_wq = NULL;
> +out_wq:
> +	keyslot_manager_destroy(blk_crypto_ksm);
> +	blk_crypto_ksm = NULL;
> +out_ksm:
> +	pr_warn("No memory for blk-crypto software fallback.");
> +	return -ENOMEM;
> +}

Nit: I find it easier to read error labels when they describe what is
done at them, rather than what was done just before the goto.
I.e. use out:, out_destroy_ksm:, out_destroy_workqueue:, etc.

> +
> +module_param(blk_crypto_num_keyslots, int, 0);
> +MODULE_PARM_DESC(blk_crypto_num_keyslots,
> +		 "Number of keyslots for software fallback in blk-crypto.");
> +module_param(num_prealloc_bounce_pg, uint, 0);
> +MODULE_PARM_DESC(num_prealloc_bounce_pg,
> +	"Number of preallocated bounce pages for blk-crypto to use during software fallback encryption");

I believe it's more common for the module_param() statements to go at
the top of the file, next to the actual variables.

Also, the module parameter names are inconsistent.  The first parameter
will be "blk_crypto.blk_crypto_num_keyslots" on command line and second
parameter will be "blk_crypto.num_prealloc_bounce_pg" on command line.

It should be "blk_crypto.num_keyslots" for the first.  No need to say
"blk_crypto" twice.

module_param_named() can be used to still prefix the C variable names
with "blk_crypto_".

> diff --git a/include/linux/blk-crypto.h b/include/linux/blk-crypto.h
> new file mode 100644
> index 000000000000..cbb5bea6dcdb
> --- /dev/null
> +++ b/include/linux/blk-crypto.h
> @@ -0,0 +1,40 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright 2019 Google LLC
> + */
> +
> +#ifndef __LINUX_BLK_CRYPTO_H
> +#define __LINUX_BLK_CRYPTO_H
> +
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_BLK_INLINE_ENCRYPTION
> +
> +struct bio;

This forward declaration should go outside the ifdef, since 'struct bio'
is referred to in both cases.

> +
> +int blk_crypto_init(void);
> +
> +int blk_crypto_submit_bio(struct bio **bio_ptr);
> +
> +bool blk_crypto_endio(struct bio *bio);
> +
> +#else /* CONFIG_BLK_INLINE_ENCRYPTION */
> +
> +static inline int blk_crypto_init(void)
> +{
> +	return 0;
> +}
> +
> +static inline int blk_crypto_submit_bio(struct bio **bio)
> +{
> +	return 0;
> +}

To be consistent with the other declaration, the parameter name here
should be 'bio_ptr', not 'bio'.

> +
> +static inline bool blk_crypto_endio(struct bio *bio)
> +{
> +	return true;
> +}
> +
> +#endif /* CONFIG_BLK_INLINE_ENCRYPTION */
> +
> +#endif /* __LINUX_BLK_CRYPTO_H */
> -- 
> 2.22.0.410.gd8fdbe21b5-goog
> 

- Eric
Jens Axboe Aug. 2, 2019, 8:51 p.m. UTC | #3
On 7/10/19 4:56 PM, Satya Tangirala wrote:
> We introduce blk-crypto, which manages programming keyslots for struct
> bios. With blk-crypto, filesystems only need to call bio_crypt_set_ctx with
> the encryption key, algorithm and data_unit_num; they don't have to worry
> about getting a keyslot for each encryption context, as blk-crypto handles
> that. Blk-crypto also makes it possible for layered devices like device
> mapper to make use of inline encryption hardware.
> 
> Blk-crypto delegates crypto operations to inline encryption hardware when
> available, and also contains a software fallback to the kernel crypto API.
> For more details, refer to Documentation/block/blk-crypto.txt.
> 
> Known issues:
> 1) We're allocating crypto_skcipher in blk_crypto_keyslot_program, which
> uses GFP_KERNEL to allocate memory, but this function is on the write
> path for IO - we need to add support for specifying a different flags
> to the crypto API.

That's a must-fix before merging, btw.

> @@ -1018,7 +1019,9 @@ blk_qc_t generic_make_request(struct bio *bio)
>   			/* Create a fresh bio_list for all subordinate requests */
>   			bio_list_on_stack[1] = bio_list_on_stack[0];
>   			bio_list_init(&bio_list_on_stack[0]);
> -			ret = q->make_request_fn(q, bio);
> +
> +			if (!blk_crypto_submit_bio(&bio))
> +				ret = q->make_request_fn(q, bio);
>   
>   			blk_queue_exit(q);

Why isn't this just stacking the ->make_request_fn() instead? Then we
could get this out of the hot path.
diff mbox series

Patch

diff --git a/Documentation/block/inline-encryption.txt b/Documentation/block/inline-encryption.txt
new file mode 100644
index 000000000000..96a7983a117d
--- /dev/null
+++ b/Documentation/block/inline-encryption.txt
@@ -0,0 +1,185 @@ 
+BLK-CRYPTO and KEYSLOT MANAGER
+===========================
+
+CONTENTS
+1. Objective
+2. Constraints and notes
+3. Design
+4. Blk-crypto
+ 4-1 What does blk-crypto do on bio submission
+5. Layered Devices
+6. Future optimizations for layered devices
+
+1. Objective
+============
+
+We want to support inline encryption (IE) in the kernel.
+To allow for testing, we also want a software fallback when actual
+IE hardware is absent. We also want IE to work with layered devices
+like dm and loopback (i.e. we want to be able to use the IE hardware
+of the underlying devices if present, or else fall back to software
+en/decryption).
+
+
+2. Constraints and notes
+========================
+
+1) IE hardware have a limited number of “keyslots” that can be programmed
+with an encryption context (key, algorithm, data unit size, etc.) at any time.
+One can specify a keyslot in a data request made to the device, and the
+device will en/decrypt the data using the encryption context programmed into
+that specified keyslot. When possible, we want to make multiple requests with
+the same encryption context share the same keyslot.
+
+2) We need a way for filesystems to specify an encryption context to use for
+en/decrypting a struct bio, and a device driver (like UFS) needs to be able
+to use that encryption context when it processes the bio.
+
+3) We need a way for device drivers to expose their capabilities in a unified
+way to the upper layers.
+
+
+3. Design
+=========
+
+We add a struct bio_crypt_context to struct bio that can represent an
+encryption context, because we need to able to pass this encryption context
+from the FS layer to the device driver to act upon.
+
+While IE hardware works on the notion of keyslots, the FS layer has no
+knowledge of keyslots - it simply wants to specify an encryption context to
+use while en/decrypting a bio.
+
+We introduce a keyslot manager (KSM) that handles the translation from
+encryption contexts specified by the FS to keyslots on the IE hardware.
+This KSM also serves as the way IE hardware can expose their capabilities to
+upper layers. The generic mode of operation is: each device driver that wants
+to support IE will construct a KSM and set it up in its struct request_queue.
+Upper layers that want to use IE on this device can then use this KSM in
+the device’s struct request_queue to translate an encryption context into
+a keyslot. The presence of the KSM in the request queue shall be used to mean
+that the device supports IE.
+
+On the device driver end of the interface, the device driver needs to tell the
+KSM how to actually manipulate the IE hardware in the device to do things like
+programming the crypto key into the IE hardware into a particular keyslot. All
+this is achieved through the struct keyslot_mgmt_ll_ops that the device driver
+passes to the KSM when creating it.
+
+It uses refcounts to track which keyslots are idle (either they have no
+encryption context programmed, or there are no in flight struct bios
+referencing that keyslot). When a new encryption context needs a keyslot, it
+tries to find a keyslot that has already been programmed with the same
+encryption context, and if there is no such keyslot, it evicts the least
+recently used idle keyslot and programs the new encryption context into that
+one. If no idle keyslots are available, then the caller will sleep until there
+is at least one.
+
+
+4. Blk-crypto
+=============
+
+The above is sufficient for simple cases, but does not work if there is a
+need for a software fallback, or if we are want to use IE with layered devices.
+To these ends, we introduce blk-crypto. Blk-crypto allows us to present a
+unified view of encryption to the FS (so FS only needs to specify an
+encryption context and not worry about keyslots at all), and blk-crypto can
+decide whether to delegate the en/decryption to IE hardware or to software
+(i.e. to the kernel crypto API). Blk-crypto maintains an internal KSM that
+serves as the software fallback to the kernel crypto API.
+
+Blk-crypto needs to ensure that the encryption context is programmed into the
+"correct" keyslot manager for IE. If a bio is submitted to a layered device
+that eventually passes the bio down to a device that really does support IE, we
+want the encryption context to be programmed into a keyslot for the KSM of the
+device with IE support. However, blk-crypto does not know a priori whether a
+particular device is the final device in the layering structure for a bio or
+not. So in the case that a particular device does not support IE, since it is
+possibly the final destination device for the bio, if the bio requires
+encryption (i.e. the bio is doing a write operation), blk-crypto must fallback
+to software *before* sending the bio to the device.
+
+Blk-crypto ensures that
+1) The bio’s encryption context is programmed into a keyslot in the KSM of the
+request queue that the bio is being submitted to (or the software fallback KSM
+if the request queue doesn’t have a KSM), and that the processing_ksm in the
+bi_crypt_context is set to this KSM
+
+2) That the bio has its own individual reference to the keyslot in this KSM.
+Once the bio passes through blk-crypto, its encryption context is programmed
+in some KSM. The “its own individual reference to the keyslot” ensures that
+keyslots can be released by each bio independently of other bios while ensuring
+that the bio has a valid reference to the keyslot when, for e.g., the software
+fallback KSM in blk-crypto performs crypto for on the device’s behalf. The
+individual references are ensured by increasing the refcount for the keyslot in
+the processing_ksm when a bio with a programmed encryption context is cloned.
+
+
+4-1. What blk-crypto does on bio submission
+-------------------------------------------
+
+Case 1: blk-crypto is given a bio with only an encryption context that hasn’t
+been programmed into any keyslot in any KSM (for e.g. a bio from the FS). In
+this case, blk-crypto will program the encryption context into the KSM of the
+request queue the bio is being submitted to (and if this KSM does not exist,
+then it will program it into blk-crypto’s internal KSM for software fallback).
+The KSM that this encryption context was programmed into is stored as the
+processing_ksm in the bio’s bi_crypt_context.
+
+Case 2: blk-crypto is given a bio whose encryption context has already been
+programmed into a keyslot in the *software fallback KSM*. In this case,
+blk-crypto does nothing; it treats the bio as not having specified an
+encryption context. Note that we cannot do what we will do in Case 3 here
+because we would have already encrypted the bio in software by this point.
+
+Case 3: blk-crypto is given a bio whose encryption context has already been
+programmed into a keyslot in some KSM (that is *not* the software fallback
+KSM). In this case, blk-crypto first releases that keyslot from that KSM and
+then treats the bio as in Case 1.
+
+This way, when a device driver is processing a bio, it can be sure that
+the bio’s encryption context has been programmed into some KSM (either the
+device driver’s request queue’s KSM, or blk-crypto’s software fallback KSM).
+It then simply needs to check if the bio’s processing_ksm is the device’s
+request queue’s KSM. If so, then it should proceed with IE. If not, it should
+simply do nothing with respect to crypto, because some other KSM (perhaps the
+blk-crypto software fallback KSM) is handling the en/decryption.
+
+Blk-crypto will release the keyslot that is being held by the bio (and also
+decrypt it if the bio is using the software fallback KSM) once
+bio_remaining_done returns true for the bio.
+
+
+5. Layered Devices
+==================
+
+Layered devices that wish to support IE need to create their own keyslot
+manager for their request queue, and expose whatever functionality they choose.
+When a layered device wants to pass a bio to another layer (either by
+resubmitting the same bio, or by submitting a clone), it doesn’t need to do
+anything special because the bio (or the clone) will once again pass through
+blk-crypto, which will work as described in Case 3. If a layered device wants
+for some reason to do the IO by itself instead of passing it on to a child
+device, but it also chose to expose IE capabilities by setting up a KSM in its
+request queue, it is then responsible for en/decrypting the data itself. In
+such cases, the device can choose to call the blk-crypto function
+blk_crypto_fallback_to_software (TODO: Not yet implemented), which will
+cause the en/decryption to be done via software fallback.
+
+
+6. Future Optimizations for layered devices
+===========================================
+
+Creating a keyslot manager for the layered device uses up memory for each
+keyslot, and in general, a layered device (like dm-linear) merely passes the
+request on to a “child” device, so the keyslots in the layered device itself
+might be completely unused. We can instead define a new type of KSM; the
+“passthrough KSM”, that layered devices can use to let blk-crypto know that
+this layered device *will* pass the bio to some child device (and hence
+through blk-crypto again, at which point blk-crypto can program the encryption
+context, instead of programming it into the layered device’s KSM). Again, if
+the device “lies” and decides to do the IO itself instead of passing it on to
+a child device, it is responsible for doing the en/decryption (and can choose
+to call blk_crypto_fallback_to_software). Another use case for the
+"passthrough KSM" is for IE devices that want to manage their own keyslots/do
+not have a limited number of keyslots.
diff --git a/block/Makefile b/block/Makefile
index 4147ffa63631..1ba7de84dbaf 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -35,4 +35,5 @@  obj-$(CONFIG_BLK_DEBUG_FS)	+= blk-mq-debugfs.o
 obj-$(CONFIG_BLK_DEBUG_FS_ZONED)+= blk-mq-debugfs-zoned.o
 obj-$(CONFIG_BLK_SED_OPAL)	+= sed-opal.o
 obj-$(CONFIG_BLK_PM)		+= blk-pm.o
-obj-$(CONFIG_BLK_INLINE_ENCRYPTION)	+= keyslot-manager.o bio-crypt-ctx.o
+obj-$(CONFIG_BLK_INLINE_ENCRYPTION)	+= keyslot-manager.o bio-crypt-ctx.o \
+					   blk-crypto.o
diff --git a/block/bio-crypt-ctx.c b/block/bio-crypt-ctx.c
index 8b884ef32007..91015645c4ea 100644
--- a/block/bio-crypt-ctx.c
+++ b/block/bio-crypt-ctx.c
@@ -23,7 +23,12 @@  EXPORT_SYMBOL(bio_crypt_free_ctx);
 
 int bio_crypt_clone(struct bio *dst, struct bio *src, gfp_t gfp_mask)
 {
-	if (!bio_is_encrypted(src))
+	/*
+	 * If a bio is swhandled, then it will be decrypted when bio_endio
+	 * is called. As we only want the data to be decrypted once, copies
+	 * of the bio must not have have a crypt context.
+	 */
+	if (!bio_is_encrypted(src) || bio_crypt_swhandled(src))
 		return 0;
 
 	dst->bi_crypt_context = bio_crypt_alloc_ctx(gfp_mask);
diff --git a/block/bio.c b/block/bio.c
index 2a272cda6dfa..390777a76468 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -16,6 +16,7 @@ 
 #include <linux/workqueue.h>
 #include <linux/cgroup.h>
 #include <linux/blk-cgroup.h>
+#include <linux/blk-crypto.h>
 
 #include <trace/events/block.h>
 #include "blk.h"
@@ -1832,6 +1833,10 @@  void bio_endio(struct bio *bio)
 again:
 	if (!bio_remaining_done(bio))
 		return;
+
+	if (!blk_crypto_endio(bio))
+		return;
+
 	if (!bio_integrity_endio(bio))
 		return;
 
diff --git a/block/blk-core.c b/block/blk-core.c
index 8340f69670d8..aec4da8c27fa 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -36,6 +36,7 @@ 
 #include <linux/blk-cgroup.h>
 #include <linux/debugfs.h>
 #include <linux/bpf.h>
+#include <linux/blk-crypto.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/block.h>
@@ -1018,7 +1019,9 @@  blk_qc_t generic_make_request(struct bio *bio)
 			/* Create a fresh bio_list for all subordinate requests */
 			bio_list_on_stack[1] = bio_list_on_stack[0];
 			bio_list_init(&bio_list_on_stack[0]);
-			ret = q->make_request_fn(q, bio);
+
+			if (!blk_crypto_submit_bio(&bio))
+				ret = q->make_request_fn(q, bio);
 
 			blk_queue_exit(q);
 
@@ -1071,6 +1074,9 @@  blk_qc_t direct_make_request(struct bio *bio)
 	if (!generic_make_request_checks(bio))
 		return BLK_QC_T_NONE;
 
+	if (blk_crypto_submit_bio(&bio))
+		return BLK_QC_T_NONE;
+
 	if (unlikely(blk_queue_enter(q, nowait ? BLK_MQ_REQ_NOWAIT : 0))) {
 		if (nowait && !blk_queue_dying(q))
 			bio->bi_status = BLK_STS_AGAIN;
@@ -1750,5 +1756,8 @@  int __init blk_dev_init(void)
 	blk_debugfs_root = debugfs_create_dir("block", NULL);
 #endif
 
+	if (blk_crypto_init() < 0)
+		panic("Failed to init blk-crypto\n");
+
 	return 0;
 }
diff --git a/block/blk-crypto.c b/block/blk-crypto.c
new file mode 100644
index 000000000000..f41fb7819ae9
--- /dev/null
+++ b/block/blk-crypto.c
@@ -0,0 +1,585 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2019 Google LLC
+ */
+#include <linux/blk-crypto.h>
+#include <linux/keyslot-manager.h>
+#include <linux/mempool.h>
+#include <linux/blk-cgroup.h>
+#include <crypto/skcipher.h>
+#include <crypto/algapi.h>
+#include <linux/module.h>
+
+struct blk_crypt_mode {
+	const char *cipher_str;
+	size_t keysize;
+};
+
+static const struct blk_crypt_mode blk_crypt_modes[] = {
+	[BLK_ENCRYPTION_MODE_AES_256_XTS] = {
+		.cipher_str = "xts(aes)",
+		.keysize = 64,
+	},
+};
+
+#define BLK_CRYPTO_MAX_KEY_SIZE 64
+/* TODO: Do we want to make this user configurable somehow? */
+static int blk_crypto_num_keyslots = 100;
+
+static struct blk_crypto_keyslot {
+	struct crypto_skcipher *tfm;
+	enum blk_crypt_mode_num crypt_mode;
+	u8 key[BLK_CRYPTO_MAX_KEY_SIZE];
+} *blk_crypto_keyslots;
+
+struct work_mem {
+	struct work_struct crypto_work;
+	struct bio *bio;
+};
+
+static struct keyslot_manager *blk_crypto_ksm;
+static struct workqueue_struct *blk_crypto_wq;
+static mempool_t *blk_crypto_page_pool;
+static struct kmem_cache *blk_crypto_work_mem_cache;
+
+static unsigned int num_prealloc_bounce_pg = 32;
+
+bool bio_crypt_swhandled(struct bio *bio)
+{
+	return bio_is_encrypted(bio) &&
+	       bio->bi_crypt_context->processing_ksm == blk_crypto_ksm;
+}
+
+static void evict_keyslot(unsigned int slot)
+{
+	struct blk_crypto_keyslot *slotp = &blk_crypto_keyslots[slot];
+
+	crypto_free_skcipher(slotp->tfm);
+	slotp->tfm = NULL;
+	memzero_explicit(slotp->key, BLK_CRYPTO_MAX_KEY_SIZE);
+}
+
+/* TODO: handle modes that need essiv */
+static int blk_crypto_keyslot_program(void *priv, const u8 *key,
+				      enum blk_crypt_mode_num crypt_mode,
+				      unsigned int data_unit_size,
+				      unsigned int slot)
+{
+	struct blk_crypto_keyslot *slotp = &blk_crypto_keyslots[slot];
+	struct crypto_skcipher *tfm = slotp->tfm;
+	const struct blk_crypt_mode *mode = &blk_crypt_modes[crypt_mode];
+	size_t keysize = mode->keysize;
+	int err;
+
+	if (crypt_mode != slotp->crypt_mode || !tfm) {
+		evict_keyslot(slot);
+		tfm = crypto_alloc_skcipher(
+			mode->cipher_str, 0, 0);
+		if (IS_ERR(tfm))
+			return PTR_ERR(tfm);
+
+		crypto_skcipher_set_flags(tfm,
+					  CRYPTO_TFM_REQ_FORBID_WEAK_KEYS);
+		slotp->crypt_mode = crypt_mode;
+		slotp->tfm = tfm;
+	}
+
+	err = crypto_skcipher_setkey(tfm, key, keysize);
+
+	if (err) {
+		evict_keyslot(slot);
+		return err;
+	}
+
+	memcpy(slotp->key, key, keysize);
+
+	return 0;
+}
+
+static int blk_crypto_keyslot_evict(void *priv, const u8 *key,
+				    enum blk_crypt_mode_num crypt_mode,
+				    unsigned int data_unit_size,
+				    unsigned int slot)
+{
+	evict_keyslot(slot);
+	return 0;
+}
+
+static int blk_crypto_keyslot_find(void *priv,
+				   const u8 *key,
+				   enum blk_crypt_mode_num crypt_mode,
+				   unsigned int data_unit_size_bytes)
+{
+	int slot;
+	const size_t keysize = blk_crypt_modes[crypt_mode].keysize;
+
+	/* TODO: hashmap? */
+	for (slot = 0; slot < blk_crypto_num_keyslots; slot++) {
+		if (blk_crypto_keyslots[slot].crypt_mode == crypt_mode &&
+		    !crypto_memneq(blk_crypto_keyslots[slot].key, key, keysize))
+			return slot;
+	}
+
+	return -ENOKEY;
+}
+
+static bool blk_crypt_mode_supported(void *priv,
+				     enum blk_crypt_mode_num crypt_mode,
+				     unsigned int data_unit_size)
+{
+	/* All blk_crypt_modes are required to have a software fallback. */
+	return true;
+}
+
+static const struct keyslot_mgmt_ll_ops blk_crypto_ksm_ll_ops = {
+	.keyslot_program	= blk_crypto_keyslot_program,
+	.keyslot_evict		= blk_crypto_keyslot_evict,
+	.keyslot_find		= blk_crypto_keyslot_find,
+	.crypt_mode_supported	= blk_crypt_mode_supported,
+};
+
+static void blk_crypto_encrypt_endio(struct bio *enc_bio)
+{
+	struct bio *src_bio = enc_bio->bi_private;
+	struct bio_vec *enc_bvec, *enc_bvec_end;
+
+	enc_bvec = enc_bio->bi_io_vec;
+	enc_bvec_end = enc_bvec + enc_bio->bi_vcnt;
+	for (; enc_bvec != enc_bvec_end; enc_bvec++)
+		mempool_free(enc_bvec->bv_page, blk_crypto_page_pool);
+
+	src_bio->bi_status = enc_bio->bi_status;
+
+	bio_put(enc_bio);
+	bio_endio(src_bio);
+}
+
+static struct bio *blk_crypto_clone_bio(struct bio *bio_src)
+{
+	struct bvec_iter iter;
+	struct bio_vec bv;
+	struct bio *bio;
+
+	bio = bio_alloc_bioset(GFP_NOIO, bio_segments(bio_src), NULL);
+	if (!bio)
+		return NULL;
+	bio->bi_disk		= bio_src->bi_disk;
+	bio->bi_opf		= bio_src->bi_opf;
+	bio->bi_ioprio		= bio_src->bi_ioprio;
+	bio->bi_write_hint	= bio_src->bi_write_hint;
+	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
+	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
+
+	bio_for_each_segment(bv, bio_src, iter)
+		bio->bi_io_vec[bio->bi_vcnt++] = bv;
+
+	if (bio_integrity(bio_src)) {
+		int ret;
+
+		ret = bio_integrity_clone(bio, bio_src, GFP_NOIO);
+		if (ret < 0) {
+			bio_put(bio);
+			return NULL;
+		}
+	}
+
+	bio_clone_blkg_association(bio, bio_src);
+	blkcg_bio_issue_init(bio);
+
+	return bio;
+}
+
+static int blk_crypto_encrypt_bio(struct bio **bio_ptr)
+{
+	struct bio *src_bio = *bio_ptr;
+	int slot;
+	struct skcipher_request *ciph_req = NULL;
+	DECLARE_CRYPTO_WAIT(wait);
+	struct bio_vec bv;
+	struct bvec_iter iter;
+	int err = 0;
+	u64 curr_dun;
+	union {
+		__le64 dun;
+		u8 bytes[16];
+	} iv;
+	struct scatterlist src, dst;
+	struct bio *enc_bio;
+	struct bio_vec *enc_bvec;
+	int i, j;
+	unsigned int num_sectors;
+	int data_unit_size;
+
+	/* Split the bio if it's too big for single page bvec */
+	i = 0;
+	num_sectors = 0;
+	data_unit_size = 1 << src_bio->bi_crypt_context->data_unit_size_bits;
+	bio_for_each_segment(bv, src_bio, iter) {
+		num_sectors += bv.bv_len >> 9;
+		if (bv.bv_len % data_unit_size != 0) {
+			src_bio->bi_status = BLK_STS_IOERR;
+			return -EIO;
+		}
+		if (++i == BIO_MAX_PAGES)
+			break;
+	}
+	if (num_sectors < bio_sectors(src_bio)) {
+		struct bio *split_bio;
+
+		split_bio = bio_split(src_bio, num_sectors, GFP_NOIO, NULL);
+		if (!split_bio) {
+			src_bio->bi_status = BLK_STS_RESOURCE;
+			return -ENOMEM;
+		}
+		bio_chain(split_bio, src_bio);
+		generic_make_request(src_bio);
+		*bio_ptr = split_bio;
+		src_bio = *bio_ptr;
+	}
+
+	enc_bio = blk_crypto_clone_bio(src_bio);
+	if (!enc_bio) {
+		src_bio->bi_status = BLK_STS_RESOURCE;
+		return -ENOMEM;
+	}
+
+	err = bio_crypt_ctx_acquire_keyslot(src_bio, blk_crypto_ksm);
+	if (err) {
+		src_bio->bi_status = BLK_STS_IOERR;
+		bio_put(enc_bio);
+		return err;
+	}
+	slot = bio_crypt_get_keyslot(src_bio);
+
+	ciph_req = skcipher_request_alloc(blk_crypto_keyslots[slot].tfm,
+					  GFP_NOIO);
+	if (!ciph_req) {
+		src_bio->bi_status = BLK_STS_RESOURCE;
+		err = -ENOMEM;
+		bio_put(enc_bio);
+		goto out_release_keyslot;
+	}
+
+	skcipher_request_set_callback(ciph_req,
+				      CRYPTO_TFM_REQ_MAY_BACKLOG |
+				      CRYPTO_TFM_REQ_MAY_SLEEP,
+				      crypto_req_done, &wait);
+
+	curr_dun = bio_crypt_data_unit_num(src_bio);
+	sg_init_table(&src, 1);
+	sg_init_table(&dst, 1);
+	for (i = 0, enc_bvec = enc_bio->bi_io_vec; i < enc_bio->bi_vcnt;
+	     enc_bvec++, i++) {
+		struct page *page = enc_bvec->bv_page;
+		struct page *ciphertext_page =
+			mempool_alloc(blk_crypto_page_pool, GFP_NOIO);
+
+		enc_bvec->bv_page = ciphertext_page;
+
+		if (!ciphertext_page)
+			goto no_mem_for_ciph_page;
+
+		for (j = 0; j < enc_bvec->bv_len / data_unit_size; j++) {
+			memset(&iv, 0, sizeof(iv));
+			iv.dun = cpu_to_le64(curr_dun);
+
+			sg_set_page(&src, page, data_unit_size,
+				    enc_bvec->bv_offset + j * data_unit_size);
+			sg_set_page(&dst, ciphertext_page, data_unit_size,
+				    enc_bvec->bv_offset + j * data_unit_size);
+
+			skcipher_request_set_crypt(ciph_req, &src, &dst,
+						   data_unit_size, iv.bytes);
+			err = crypto_wait_req(crypto_skcipher_encrypt(ciph_req),
+					      &wait);
+			if (err)
+				goto no_mem_for_ciph_page;
+			curr_dun++;
+		}
+		continue;
+no_mem_for_ciph_page:
+		err = -ENOMEM;
+		for (j = i - 1; j >= 0; j--) {
+			mempool_free(enc_bio->bi_io_vec[j].bv_page,
+				     blk_crypto_page_pool);
+		}
+		src_bio->bi_status = BLK_STS_RESOURCE;
+		bio_put(enc_bio);
+		goto out_release_cipher;
+	}
+
+	enc_bio->bi_private = src_bio;
+	enc_bio->bi_end_io = blk_crypto_encrypt_endio;
+
+	*bio_ptr = enc_bio;
+out_release_cipher:
+	skcipher_request_free(ciph_req);
+out_release_keyslot:
+	bio_crypt_ctx_release_keyslot(src_bio);
+	return err;
+}
+
+/*
+ * TODO: assumption right now is:
+ * each segment in bio has length divisible by the data_unit_size
+ */
+static void blk_crypto_decrypt_bio(struct work_struct *w)
+{
+	struct work_mem *work_mem =
+		container_of(w, struct work_mem, crypto_work);
+	struct bio *bio = work_mem->bio;
+	int slot;
+	struct skcipher_request *ciph_req;
+	DECLARE_CRYPTO_WAIT(wait);
+	struct bio_vec bv;
+	struct bvec_iter iter;
+	u64 curr_dun;
+	union {
+		__le64 dun;
+		u8 bytes[16];
+	} iv;
+	struct scatterlist sg;
+	int err;
+	int data_unit_size = 1 << bio->bi_crypt_context->data_unit_size_bits;
+	int i;
+
+	kmem_cache_free(blk_crypto_work_mem_cache, work_mem);
+
+	err = bio_crypt_ctx_acquire_keyslot(bio, blk_crypto_ksm);
+	if (err) {
+		bio->bi_status = BLK_STS_RESOURCE;
+		goto out_no_keyslot;
+	}
+
+	slot = bio_crypt_get_keyslot(bio);
+	ciph_req = skcipher_request_alloc(blk_crypto_keyslots[slot].tfm,
+					  GFP_NOIO);
+	if (!ciph_req) {
+		bio->bi_status = BLK_STS_RESOURCE;
+		goto out;
+	}
+
+	skcipher_request_set_callback(ciph_req,
+				      CRYPTO_TFM_REQ_MAY_BACKLOG |
+				      CRYPTO_TFM_REQ_MAY_SLEEP,
+				      crypto_req_done, &wait);
+
+	curr_dun = bio_crypt_sw_data_unit_num(bio);
+	sg_init_table(&sg, 1);
+	__bio_for_each_segment(bv, bio, iter,
+			       bio->bi_crypt_context->crypt_iter) {
+		if (bv.bv_len % data_unit_size != 0) {
+			bio->bi_status = BLK_STS_IOERR;
+			err = -EIO;
+			goto out;
+		}
+		for (i = 0; i < bv.bv_len / data_unit_size; i++) {
+			struct page *page = bv.bv_page;
+
+			memset(&iv, 0, sizeof(iv));
+			iv.dun = cpu_to_le64(curr_dun);
+
+			sg_set_page(&sg, page, data_unit_size,
+				    bv.bv_offset + i * data_unit_size);
+			skcipher_request_set_crypt(ciph_req, &sg, &sg,
+						   data_unit_size, iv.bytes);
+			err = crypto_wait_req(crypto_skcipher_decrypt(ciph_req),
+					      &wait);
+			if (err) {
+				bio->bi_status = BLK_STS_IOERR;
+				goto out;
+			}
+			curr_dun++;
+		}
+	}
+
+out:
+	skcipher_request_free(ciph_req);
+	bio_crypt_ctx_release_keyslot(bio);
+out_no_keyslot:
+	bio_endio(bio);
+}
+
+static void blk_crypto_queue_decrypt_bio(struct bio *bio)
+{
+	struct work_mem *work_mem =
+		kmem_cache_zalloc(blk_crypto_work_mem_cache, GFP_ATOMIC);
+
+	if (!work_mem) {
+		bio->bi_status = BLK_STS_RESOURCE;
+		return bio_endio(bio);
+	}
+
+	INIT_WORK(&work_mem->crypto_work, blk_crypto_decrypt_bio);
+	work_mem->bio = bio;
+	queue_work(blk_crypto_wq, &work_mem->crypto_work);
+}
+
+/**
+ * blk_crypto_submit_bio - handle submitting bio for inline encryption
+ *
+ * @bio_ptr: pointer to original bio pointer
+ *
+ * If the bio doesn't have inline encryption enabled or the submitter already
+ * specified a keyslot for the target device, do nothing.  Else, a raw key must
+ * have been provided, so acquire a device keyslot for it if supported.  Else,
+ * use the software crypto fallback.
+ *
+ * When the software fallback is used for encryption, blk-crypto may choose to
+ * split the bio into 2 - one that will continue to be processed and the other
+ * that will be resubmitted via generic_make_request. *bio_ptr will be updated
+ * to the first bio (the one that will continue to be processed).
+ *
+ * Return: 0 if bio submission should continue; nonzero if bio_endio() was
+ *        already called so bio submission should abort.
+ */
+int blk_crypto_submit_bio(struct bio **bio_ptr)
+{
+	struct bio *bio = *bio_ptr;
+	struct request_queue *q;
+	int err;
+	struct bio_crypt_ctx *crypt_ctx;
+
+	if (!bio_is_encrypted(bio) || !bio_has_data(bio))
+		return 0;
+
+	/*
+	 * When a read bio is marked for sw decryption, its bi_iter is saved
+	 * so that when we decrypt the bio later, we know what part of it was
+	 * marked for sw decryption (when the bio is passed down after
+	 * blk_crypto_submit bio, it may be split or advanced so we cannot rely
+	 * on the bi_iter while decrypting in blk_crypto_endio)
+	 */
+	if (bio_crypt_swhandled(bio))
+		return 0;
+
+	crypt_ctx = bio->bi_crypt_context;
+	q = bio->bi_disk->queue;
+
+	if (bio_crypt_has_keyslot(bio)) {
+		/* Key already programmed into device? */
+		if (q->ksm == crypt_ctx->processing_ksm)
+			return 0;
+
+		/* Nope, release the existing keyslot. */
+		bio_crypt_ctx_release_keyslot(bio);
+	}
+
+	/* Get device keyslot if supported */
+	if (q->ksm) {
+		err = bio_crypt_ctx_acquire_keyslot(bio, q->ksm);
+		if (!err) {
+			pr_warn_once("blk-crypto: failed to acquire keyslot for %s (err=%d).  Falling back to software crypto.\n",
+				     bio->bi_disk->disk_name, err);
+			return 0;
+		}
+	}
+
+	/* Fallback to software crypto */
+	if (bio_data_dir(bio) == WRITE) {
+		/* Encrypt the data now */
+		err = blk_crypto_encrypt_bio(bio_ptr);
+		if (err)
+			goto out_encrypt_err;
+	} else {
+		/* Mark bio as swhandled */
+		bio->bi_crypt_context->processing_ksm = blk_crypto_ksm;
+		bio->bi_crypt_context->crypt_iter = bio->bi_iter;
+		bio->bi_crypt_context->sw_data_unit_num =
+				bio->bi_crypt_context->data_unit_num;
+	}
+	return 0;
+out_encrypt_err:
+	bio_endio(bio);
+	return err;
+}
+
+/**
+ * blk_crypto_endio - clean up bio w.r.t inline encryption during bio_endio
+ *
+ * @bio - the bio to clean up
+ *
+ * If blk_crypto_submit_bio decided to fallback to software crypto for this
+ * bio, we queue the bio for decryption into a workqueue and return false,
+ * and call bio_endio(bio) at a later time (after the bio has been decrypted).
+ *
+ * If the bio is not to be decrypted in software, this function releases the
+ * reference to the keyslot that blk_crypto_submit_bio got.
+ *
+ * Return: true if bio_endio should continue; false otherwise (bio_endio will
+ * be called again when bio has been decrypted).
+ */
+bool blk_crypto_endio(struct bio *bio)
+{
+	if (bio_crypt_swhandled(bio)) {
+		/*
+		 * The only bios that are swhandled when they reach here
+		 * are those with bio_data_dir(bio) == READ, since WRITE
+		 * bios that are encrypted by software fallback are handled
+		 * by blk_crypto_encrypt_endio.
+		 */
+		blk_crypto_queue_decrypt_bio(bio);
+		return false;
+	}
+
+	if (bio_is_encrypted(bio) && bio_crypt_has_keyslot(bio))
+		bio_crypt_ctx_release_keyslot(bio);
+
+	return true;
+}
+
+int __init blk_crypto_init(void)
+{
+	blk_crypto_ksm = keyslot_manager_create(blk_crypto_num_keyslots,
+						&blk_crypto_ksm_ll_ops,
+						NULL);
+	if (!blk_crypto_ksm)
+		goto out_ksm;
+
+	blk_crypto_wq = alloc_workqueue("blk_crypto_wq",
+					WQ_UNBOUND | WQ_HIGHPRI |
+					WQ_MEM_RECLAIM,
+					num_online_cpus());
+	if (!blk_crypto_wq)
+		goto out_wq;
+
+	blk_crypto_keyslots = kcalloc(blk_crypto_num_keyslots,
+				      sizeof(*blk_crypto_keyslots),
+				      GFP_KERNEL);
+	if (!blk_crypto_keyslots)
+		goto out_blk_crypto_keyslots;
+
+	blk_crypto_page_pool =
+		mempool_create_page_pool(num_prealloc_bounce_pg, 0);
+	if (!blk_crypto_page_pool)
+		goto out_bounce_pool;
+
+	blk_crypto_work_mem_cache = KMEM_CACHE(work_mem, SLAB_RECLAIM_ACCOUNT);
+	if (!blk_crypto_work_mem_cache)
+		goto out_work_mem_cache;
+
+	return 0;
+
+out_work_mem_cache:
+	mempool_destroy(blk_crypto_page_pool);
+	blk_crypto_page_pool = NULL;
+out_bounce_pool:
+	kzfree(blk_crypto_keyslots);
+	blk_crypto_keyslots = NULL;
+out_blk_crypto_keyslots:
+	destroy_workqueue(blk_crypto_wq);
+	blk_crypto_wq = NULL;
+out_wq:
+	keyslot_manager_destroy(blk_crypto_ksm);
+	blk_crypto_ksm = NULL;
+out_ksm:
+	pr_warn("No memory for blk-crypto software fallback.");
+	return -ENOMEM;
+}
+
+module_param(blk_crypto_num_keyslots, int, 0);
+MODULE_PARM_DESC(blk_crypto_num_keyslots,
+		 "Number of keyslots for software fallback in blk-crypto.");
+module_param(num_prealloc_bounce_pg, uint, 0);
+MODULE_PARM_DESC(num_prealloc_bounce_pg,
+	"Number of preallocated bounce pages for blk-crypto to use during software fallback encryption");
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 4e664d6441d5..66ad1508911f 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -612,6 +612,8 @@  static inline void bio_crypt_advance(struct bio *bio, unsigned int bytes)
 	}
 }
 
+extern bool bio_crypt_swhandled(struct bio *bio);
+
 static inline bool bio_crypt_has_keyslot(struct bio *bio)
 {
 	return bio->bi_crypt_context->keyslot >= 0;
@@ -717,6 +719,11 @@  static inline void bio_crypt_set_ctx(struct bio *bio,
 				     u64 dun,
 				     unsigned int dun_bits) { }
 
+static inline bool bio_crypt_swhandled(struct bio *bio)
+{
+	return false;
+}
+
 static inline bool bio_crypt_has_keyslot(struct bio *bio)
 {
 	return false;
diff --git a/include/linux/blk-crypto.h b/include/linux/blk-crypto.h
new file mode 100644
index 000000000000..cbb5bea6dcdb
--- /dev/null
+++ b/include/linux/blk-crypto.h
@@ -0,0 +1,40 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2019 Google LLC
+ */
+
+#ifndef __LINUX_BLK_CRYPTO_H
+#define __LINUX_BLK_CRYPTO_H
+
+#include <linux/types.h>
+
+#ifdef CONFIG_BLK_INLINE_ENCRYPTION
+
+struct bio;
+
+int blk_crypto_init(void);
+
+int blk_crypto_submit_bio(struct bio **bio_ptr);
+
+bool blk_crypto_endio(struct bio *bio);
+
+#else /* CONFIG_BLK_INLINE_ENCRYPTION */
+
+static inline int blk_crypto_init(void)
+{
+	return 0;
+}
+
+static inline int blk_crypto_submit_bio(struct bio **bio)
+{
+	return 0;
+}
+
+static inline bool blk_crypto_endio(struct bio *bio)
+{
+	return true;
+}
+
+#endif /* CONFIG_BLK_INLINE_ENCRYPTION */
+
+#endif /* __LINUX_BLK_CRYPTO_H */