[RFC] target/file: add support of direct and async I/O
diff mbox

Message ID 20180309004259.16052-1-avagin@openvz.org
State New, archived
Headers show

Commit Message

Andrey Vagin March 9, 2018, 12:42 a.m. UTC
Direct I/O allows to not affect the write-back cache, this is
expected when a non-buffered mode is used.

Async I/O allows to handle a few commands concurrently, so a target shows a
better perfomance:

Mode: O_DSYNC Async: 1
$ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sda --runtime=20 --numjobs=2
  WRITE: bw=45.9MiB/s (48.1MB/s), 21.9MiB/s-23.0MiB/s (22.0MB/s-25.2MB/s), io=919MiB (963MB), run=20002-20020msec

Mode: O_DSYNC Async: 0
$ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sdb --runtime=20 --numjobs=2
  WRITE: bw=1607KiB/s (1645kB/s), 802KiB/s-805KiB/s (821kB/s-824kB/s), io=31.8MiB (33.4MB), run=20280-20295msec

Known issue:

DIF (PI) emulation doesn't work when a target uses async I/O, because
DIF metadata is saved in a separate file, and it is another non-trivial
task how to synchronize writing in two files, so that a following read
operation always returns a consisten metadata for a specified block.

Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
---
 drivers/target/target_core_file.c | 124 ++++++++++++++++++++++++++++++++++++--
 drivers/target/target_core_file.h |   1 +
 2 files changed, 120 insertions(+), 5 deletions(-)

Comments

Bryant G. Ly March 15, 2018, 2:26 p.m. UTC | #1
On 3/8/18 6:42 PM, Andrei Vagin wrote:

> Direct I/O allows to not affect the write-back cache, this is
> expected when a non-buffered mode is used.
>
> Async I/O allows to handle a few commands concurrently, so a target shows a
> better perfomance:
>
> Mode: O_DSYNC Async: 1
> $ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sda --runtime=20 --numjobs=2
>   WRITE: bw=45.9MiB/s (48.1MB/s), 21.9MiB/s-23.0MiB/s (22.0MB/s-25.2MB/s), io=919MiB (963MB), run=20002-20020msec
>
> Mode: O_DSYNC Async: 0
> $ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sdb --runtime=20 --numjobs=2
>   WRITE: bw=1607KiB/s (1645kB/s), 802KiB/s-805KiB/s (821kB/s-824kB/s), io=31.8MiB (33.4MB), run=20280-20295msec
>
> Known issue:
>
> DIF (PI) emulation doesn't work when a target uses async I/O, because
> DIF metadata is saved in a separate file, and it is another non-trivial
> task how to synchronize writing in two files, so that a following read
> operation always returns a consisten metadata for a specified block.
>
> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
> Signed-off-by: Andrei Vagin <avagin@openvz.org>
> ---
>  drivers/target/target_core_file.c | 124 ++++++++++++++++++++++++++++++++++++--
>  drivers/target/target_core_file.h |   1 +
>  2 files changed, 120 insertions(+), 5 deletions(-)
>
>
Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>

Patch looks good to me - Thanks for the performance enhancement! 

Btw I have been running I/O tests with HTX against this patch for 24 hrs and have no problems.

-Bryant

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrey Vagin March 15, 2018, 5:04 p.m. UTC | #2
On Thu, Mar 15, 2018 at 09:26:57AM -0500, Bryant G. Ly wrote:
> On 3/8/18 6:42 PM, Andrei Vagin wrote:
> 
> > Direct I/O allows to not affect the write-back cache, this is
> > expected when a non-buffered mode is used.
> >
> > Async I/O allows to handle a few commands concurrently, so a target shows a
> > better perfomance:
> >
> > Mode: O_DSYNC Async: 1
> > $ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sda --runtime=20 --numjobs=2
> >   WRITE: bw=45.9MiB/s (48.1MB/s), 21.9MiB/s-23.0MiB/s (22.0MB/s-25.2MB/s), io=919MiB (963MB), run=20002-20020msec
> >
> > Mode: O_DSYNC Async: 0
> > $ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sdb --runtime=20 --numjobs=2
> >   WRITE: bw=1607KiB/s (1645kB/s), 802KiB/s-805KiB/s (821kB/s-824kB/s), io=31.8MiB (33.4MB), run=20280-20295msec
> >
> > Known issue:
> >
> > DIF (PI) emulation doesn't work when a target uses async I/O, because
> > DIF metadata is saved in a separate file, and it is another non-trivial
> > task how to synchronize writing in two files, so that a following read
> > operation always returns a consisten metadata for a specified block.
> >
> > Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
> > Signed-off-by: Andrei Vagin <avagin@openvz.org>
> > ---
> >  drivers/target/target_core_file.c | 124 ++++++++++++++++++++++++++++++++++++--
> >  drivers/target/target_core_file.h |   1 +
> >  2 files changed, 120 insertions(+), 5 deletions(-)
> >
> >
> Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
> 
> Patch looks good to me - Thanks for the performance enhancement! 
> 
> Btw I have been running I/O tests with HTX against this patch for 24 hrs and have no problems.

Bryant, thank you for the feedback.

> 
> -Bryant
> 
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig March 16, 2018, 7:50 a.m. UTC | #3
> DIF (PI) emulation doesn't work when a target uses async I/O, because
> DIF metadata is saved in a separate file, and it is another non-trivial
> task how to synchronize writing in two files, so that a following read
> operation always returns a consisten metadata for a specified block.

There literally is no way to do that, even without aio.  The file
DIF implementation should probably regarded as an early bringup /
prototype tool, not something really usable.

> +static void cmd_rw_aio_do_completion(struct target_core_file_cmd *cmd)
> +{
> +	if (!atomic_dec_and_test(&cmd->ref))
> +		return;

There is no need for reference counting.  If the read_iter/write iter
method returns -EIOCBQUEUED the completion callback needs to complete
the I/O and free the structure, else the method caller.

> +	if (!(fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE))
> +		aio_cmd->iocb.ki_flags |= IOCB_DIRECT;

aio without IOCB_DIRECT doesn't make any sense. But the WCE flag
really has nothing to do with buffers vs direct I/O anyway.

> +	if (is_write)
> +		ret = call_write_iter(file, &aio_cmd->iocb, &iter);
> +	else
> +		ret = call_read_iter(file, &aio_cmd->iocb, &iter);

Please call the methods directly instead of through the wrappers.

> +
>  static int fd_do_rw(struct se_cmd *cmd, struct file *fd,
>  		    u32 block_size, struct scatterlist *sgl,
>  		    u32 sgl_nents, u32 data_length, int is_write)
> @@ -536,6 +626,7 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
>  	struct file *pfile = fd_dev->fd_prot_file;
>  	sense_reason_t rc;
>  	int ret = 0;
> +	int aio = fd_dev->fbd_flags & FDBD_HAS_ASYNC_IO;
>  	/*
>  	 * We are currently limited by the number of iovecs (2048) per
>  	 * single vfs_[writev,readv] call.
> @@ -550,7 +641,11 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
>  	 * Call vectorized fileio functions to map struct scatterlist
>  	 * physical memory addresses to struct iovec virtual memory.
>  	 */
> -	if (data_direction == DMA_FROM_DEVICE) {
> +	if (aio) {

fd_execute_rw shares basically no code with the aio case.  I'd rather
have a very high level wrapper here:

static sense_reason_t
fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
		enum dma_data_direction data_direction)
{
	if (FD_DEV(cmd->se_dev)->fbd_flags & FDBD_HAS_ASYNC_IO)
		return fd_execute_rw_aio(cmd, sgl, sgl_nents, dma_direction);
	return fd_execute_rw_buffered(cmd, sgl, sgl_nents, dma_direction);
}

and keep the code separate.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrey Vagin March 17, 2018, 12:13 a.m. UTC | #4
Hi Christoph,

Thank you for the review. All comments look reasonable. I will fix and
set a final version soon.

Pls, answer on one inline question.

On Fri, Mar 16, 2018 at 12:50:27AM -0700, Christoph Hellwig wrote:
> > DIF (PI) emulation doesn't work when a target uses async I/O, because
> > DIF metadata is saved in a separate file, and it is another non-trivial
> > task how to synchronize writing in two files, so that a following read
> > operation always returns a consisten metadata for a specified block.
> 
> There literally is no way to do that, even without aio.  The file
> DIF implementation should probably regarded as an early bringup /
> prototype tool, not something really usable.
> 
> > +static void cmd_rw_aio_do_completion(struct target_core_file_cmd *cmd)
> > +{
> > +	if (!atomic_dec_and_test(&cmd->ref))
> > +		return;
> 
> There is no need for reference counting.  If the read_iter/write iter
> method returns -EIOCBQUEUED the completion callback needs to complete
> the I/O and free the structure, else the method caller.
> 
> > +	if (!(fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE))
> > +		aio_cmd->iocb.ki_flags |= IOCB_DIRECT;
> 
> aio without IOCB_DIRECT doesn't make any sense. But the WCE flag
> really has nothing to do with buffers vs direct I/O anyway.
> 
> > +	if (is_write)
> > +		ret = call_write_iter(file, &aio_cmd->iocb, &iter);
> > +	else
> > +		ret = call_read_iter(file, &aio_cmd->iocb, &iter);
> 
> Please call the methods directly instead of through the wrappers.

Do you mean to call file->f_op->write_iter(kio, iter) instead of
call_write_iter()? What is wrong with these wrappers?

Thanks,
Andrei

> 
> > +
> >  static int fd_do_rw(struct se_cmd *cmd, struct file *fd,
> >  		    u32 block_size, struct scatterlist *sgl,
> >  		    u32 sgl_nents, u32 data_length, int is_write)
> > @@ -536,6 +626,7 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> >  	struct file *pfile = fd_dev->fd_prot_file;
> >  	sense_reason_t rc;
> >  	int ret = 0;
> > +	int aio = fd_dev->fbd_flags & FDBD_HAS_ASYNC_IO;
> >  	/*
> >  	 * We are currently limited by the number of iovecs (2048) per
> >  	 * single vfs_[writev,readv] call.
> > @@ -550,7 +641,11 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> >  	 * Call vectorized fileio functions to map struct scatterlist
> >  	 * physical memory addresses to struct iovec virtual memory.
> >  	 */
> > -	if (data_direction == DMA_FROM_DEVICE) {
> > +	if (aio) {
> 
> fd_execute_rw shares basically no code with the aio case.  I'd rather
> have a very high level wrapper here:
> 
> static sense_reason_t
> fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> 		enum dma_data_direction data_direction)
> {
> 	if (FD_DEV(cmd->se_dev)->fbd_flags & FDBD_HAS_ASYNC_IO)
> 		return fd_execute_rw_aio(cmd, sgl, sgl_nents, dma_direction);
> 	return fd_execute_rw_buffered(cmd, sgl, sgl_nents, dma_direction);
> }
> 
> and keep the code separate.
> 
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig March 17, 2018, 8:25 a.m. UTC | #5
On Fri, Mar 16, 2018 at 05:13:25PM -0700, Andrei Vagin wrote:
> > Please call the methods directly instead of through the wrappers.
> 
> Do you mean to call file->f_op->write_iter(kio, iter) instead of
> call_write_iter()? What is wrong with these wrappers?

Yes. They are completely pointless and just obsfucate the code.
I plan to remove them eventually.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrey Vagin March 20, 2018, 7:54 a.m. UTC | #6
On Fri, Mar 16, 2018 at 12:50:27AM -0700, Christoph Hellwig wrote:
> > DIF (PI) emulation doesn't work when a target uses async I/O, because
> > DIF metadata is saved in a separate file, and it is another non-trivial
> > task how to synchronize writing in two files, so that a following read
> > operation always returns a consisten metadata for a specified block.
> 
> There literally is no way to do that, even without aio.  The file
> DIF implementation should probably regarded as an early bringup /
> prototype tool, not something really usable.
> 
> > +static void cmd_rw_aio_do_completion(struct target_core_file_cmd *cmd)
> > +{
> > +	if (!atomic_dec_and_test(&cmd->ref))
> > +		return;
> 
> There is no need for reference counting.  If the read_iter/write iter
> method returns -EIOCBQUEUED the completion callback needs to complete
> the I/O and free the structure, else the method caller.

I was near to send a final version, but I decided to investigate how a
reference counter was appeared in drivers/block/loop.c:

commit 92d773324b7edbd36bf0c28c1e0157763aeccc92
Author: Shaohua Li <shli@fb.com>
Date:   Fri Sep 1 11:15:17 2017 -0700

    block/loop: fix use after free
    
    lo_rw_aio->call_read_iter->
    1       aops->direct_IO
    2       iov_iter_revert
    lo_rw_aio_complete could happen between 1 and 2, the bio and bvec
could
    be freed before 2, which accesses bvec.
    
    Signed-off-by: Shaohua Li <shli@fb.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

This commit looks reasonable, doesn't it? In out case, bvec-s are
freed from the callback too.

> 
> > +	if (!(fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE))
> > +		aio_cmd->iocb.ki_flags |= IOCB_DIRECT;
> 
> aio without IOCB_DIRECT doesn't make any sense. But the WCE flag
> really has nothing to do with buffers vs direct I/O anyway.
> 
> > +	if (is_write)
> > +		ret = call_write_iter(file, &aio_cmd->iocb, &iter);
> > +	else
> > +		ret = call_read_iter(file, &aio_cmd->iocb, &iter);
> 
> Please call the methods directly instead of through the wrappers.
> 
> > +
> >  static int fd_do_rw(struct se_cmd *cmd, struct file *fd,
> >  		    u32 block_size, struct scatterlist *sgl,
> >  		    u32 sgl_nents, u32 data_length, int is_write)
> > @@ -536,6 +626,7 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> >  	struct file *pfile = fd_dev->fd_prot_file;
> >  	sense_reason_t rc;
> >  	int ret = 0;
> > +	int aio = fd_dev->fbd_flags & FDBD_HAS_ASYNC_IO;
> >  	/*
> >  	 * We are currently limited by the number of iovecs (2048) per
> >  	 * single vfs_[writev,readv] call.
> > @@ -550,7 +641,11 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> >  	 * Call vectorized fileio functions to map struct scatterlist
> >  	 * physical memory addresses to struct iovec virtual memory.
> >  	 */
> > -	if (data_direction == DMA_FROM_DEVICE) {
> > +	if (aio) {
> 
> fd_execute_rw shares basically no code with the aio case.  I'd rather
> have a very high level wrapper here:
> 
> static sense_reason_t
> fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> 		enum dma_data_direction data_direction)
> {
> 	if (FD_DEV(cmd->se_dev)->fbd_flags & FDBD_HAS_ASYNC_IO)
> 		return fd_execute_rw_aio(cmd, sgl, sgl_nents, dma_direction);
> 	return fd_execute_rw_buffered(cmd, sgl, sgl_nents, dma_direction);
> }
> 
> and keep the code separate.
> 
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c
index 9b2c0c773022..e8c07a0f7084 100644
--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -250,6 +250,96 @@  static void fd_destroy_device(struct se_device *dev)
 	}
 }
 
+struct target_core_file_cmd {
+	atomic_t	ref;
+	unsigned long	len;
+	long		ret;
+	struct se_cmd	*cmd;
+	struct kiocb	iocb;
+	struct bio_vec	bvec[0];
+};
+
+static void cmd_rw_aio_do_completion(struct target_core_file_cmd *cmd)
+{
+	if (!atomic_dec_and_test(&cmd->ref))
+		return;
+
+	if (cmd->ret != cmd->len)
+		target_complete_cmd(cmd->cmd, SAM_STAT_CHECK_CONDITION);
+	else
+		target_complete_cmd(cmd->cmd, SAM_STAT_GOOD);
+
+	kfree(cmd);
+}
+
+static void cmd_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
+{
+	struct target_core_file_cmd *cmd;
+
+	cmd = container_of(iocb, struct target_core_file_cmd, iocb);
+
+	cmd->ret = ret;
+	cmd_rw_aio_do_completion(cmd);
+}
+
+static int fd_do_aio_rw(struct se_cmd *cmd, struct fd_dev *fd_dev,
+		    u32 block_size, struct scatterlist *sgl,
+		    u32 sgl_nents, u32 data_length, int is_write)
+{
+	struct file *file = fd_dev->fd_file;
+	struct target_core_file_cmd *aio_cmd;
+	struct scatterlist *sg;
+	struct iov_iter iter = {};
+	struct bio_vec *bvec;
+	ssize_t len = 0;
+	loff_t pos = (cmd->t_task_lba * block_size);
+	int ret = 0, i;
+
+	aio_cmd = kmalloc(sizeof(struct target_core_file_cmd) +
+			  sgl_nents * sizeof(struct bio_vec),
+			  GFP_KERNEL | __GFP_ZERO);
+	if (!aio_cmd)
+		return -ENOMEM;
+
+	bvec = aio_cmd->bvec;
+
+	for_each_sg(sgl, sg, sgl_nents, i) {
+		bvec[i].bv_page = sg_page(sg);
+		bvec[i].bv_len = sg->length;
+		bvec[i].bv_offset = sg->offset;
+
+		len += sg->length;
+	}
+
+	iov_iter_bvec(&iter, ITER_BVEC | is_write, bvec, sgl_nents, len);
+
+	atomic_set(&aio_cmd->ref, 2);
+
+	aio_cmd->cmd = cmd;
+	aio_cmd->len = len;
+	aio_cmd->iocb.ki_pos = pos;
+	aio_cmd->iocb.ki_filp = file;
+	aio_cmd->iocb.ki_complete = cmd_rw_aio_complete;
+	aio_cmd->iocb.ki_flags = 0;
+
+	if (!(fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE))
+		aio_cmd->iocb.ki_flags |= IOCB_DIRECT;
+	if (is_write && (cmd->se_cmd_flags & SCF_FUA))
+		aio_cmd->iocb.ki_flags |= IOCB_DSYNC;
+
+	if (is_write)
+		ret = call_write_iter(file, &aio_cmd->iocb, &iter);
+	else
+		ret = call_read_iter(file, &aio_cmd->iocb, &iter);
+
+	cmd_rw_aio_do_completion(aio_cmd);
+
+	if (ret != -EIOCBQUEUED)
+		aio_cmd->iocb.ki_complete(&aio_cmd->iocb, ret, 0);
+
+	return 0;
+}
+
 static int fd_do_rw(struct se_cmd *cmd, struct file *fd,
 		    u32 block_size, struct scatterlist *sgl,
 		    u32 sgl_nents, u32 data_length, int is_write)
@@ -536,6 +626,7 @@  fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 	struct file *pfile = fd_dev->fd_prot_file;
 	sense_reason_t rc;
 	int ret = 0;
+	int aio = fd_dev->fbd_flags & FDBD_HAS_ASYNC_IO;
 	/*
 	 * We are currently limited by the number of iovecs (2048) per
 	 * single vfs_[writev,readv] call.
@@ -550,7 +641,11 @@  fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 	 * Call vectorized fileio functions to map struct scatterlist
 	 * physical memory addresses to struct iovec virtual memory.
 	 */
-	if (data_direction == DMA_FROM_DEVICE) {
+	if (aio) {
+		ret = fd_do_aio_rw(cmd, fd_dev, dev->dev_attrib.block_size,
+			       sgl, sgl_nents, cmd->data_length,
+				!(data_direction == DMA_FROM_DEVICE));
+	} else if (data_direction == DMA_FROM_DEVICE) {
 		if (cmd->prot_type && dev->dev_attrib.pi_prot_type) {
 			ret = fd_do_rw(cmd, pfile, dev->prot_length,
 				       cmd->t_prot_sg, cmd->t_prot_nents,
@@ -616,18 +711,21 @@  fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 	if (ret < 0)
 		return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 
-	target_complete_cmd(cmd, SAM_STAT_GOOD);
+	if (!aio)
+		target_complete_cmd(cmd, SAM_STAT_GOOD);
 	return 0;
 }
 
 enum {
-	Opt_fd_dev_name, Opt_fd_dev_size, Opt_fd_buffered_io, Opt_err
+	Opt_fd_dev_name, Opt_fd_dev_size, Opt_fd_buffered_io,
+	Opt_fd_async_io, Opt_err
 };
 
 static match_table_t tokens = {
 	{Opt_fd_dev_name, "fd_dev_name=%s"},
 	{Opt_fd_dev_size, "fd_dev_size=%s"},
 	{Opt_fd_buffered_io, "fd_buffered_io=%d"},
+	{Opt_fd_async_io, "fd_async_io=%d"},
 	{Opt_err, NULL}
 };
 
@@ -693,6 +791,21 @@  static ssize_t fd_set_configfs_dev_params(struct se_device *dev,
 
 			fd_dev->fbd_flags |= FDBD_HAS_BUFFERED_IO_WCE;
 			break;
+		case Opt_fd_async_io:
+			ret = match_int(args, &arg);
+			if (ret)
+				goto out;
+			if (arg != 1) {
+				pr_err("bogus fd_async_io=%d value\n", arg);
+				ret = -EINVAL;
+				goto out;
+			}
+
+			pr_debug("FILEIO: Using async I/O"
+				" operations for struct fd_dev\n");
+
+			fd_dev->fbd_flags |= FDBD_HAS_ASYNC_IO;
+			break;
 		default:
 			break;
 		}
@@ -709,10 +822,11 @@  static ssize_t fd_show_configfs_dev_params(struct se_device *dev, char *b)
 	ssize_t bl = 0;
 
 	bl = sprintf(b + bl, "TCM FILEIO ID: %u", fd_dev->fd_dev_id);
-	bl += sprintf(b + bl, "        File: %s  Size: %llu  Mode: %s\n",
+	bl += sprintf(b + bl, "        File: %s  Size: %llu  Mode: %s Async: %d\n",
 		fd_dev->fd_dev_name, fd_dev->fd_dev_size,
 		(fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE) ?
-		"Buffered-WCE" : "O_DSYNC");
+		"Buffered-WCE" : "O_DSYNC",
+		!!(fd_dev->fbd_flags & FDBD_HAS_ASYNC_IO));
 	return bl;
 }
 
diff --git a/drivers/target/target_core_file.h b/drivers/target/target_core_file.h
index 53be5ffd3261..929b1ecd544e 100644
--- a/drivers/target/target_core_file.h
+++ b/drivers/target/target_core_file.h
@@ -22,6 +22,7 @@ 
 #define FBDF_HAS_PATH		0x01
 #define FBDF_HAS_SIZE		0x02
 #define FDBD_HAS_BUFFERED_IO_WCE 0x04
+#define FDBD_HAS_ASYNC_IO	 0x08
 #define FDBD_FORMAT_UNIT_SIZE	2048
 
 struct fd_dev {