Message ID | 20180309004259.16052-1-avagin@openvz.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On 3/8/18 6:42 PM, Andrei Vagin wrote: > Direct I/O allows to not affect the write-back cache, this is > expected when a non-buffered mode is used. > > Async I/O allows to handle a few commands concurrently, so a target shows a > better perfomance: > > Mode: O_DSYNC Async: 1 > $ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sda --runtime=20 --numjobs=2 > WRITE: bw=45.9MiB/s (48.1MB/s), 21.9MiB/s-23.0MiB/s (22.0MB/s-25.2MB/s), io=919MiB (963MB), run=20002-20020msec > > Mode: O_DSYNC Async: 0 > $ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sdb --runtime=20 --numjobs=2 > WRITE: bw=1607KiB/s (1645kB/s), 802KiB/s-805KiB/s (821kB/s-824kB/s), io=31.8MiB (33.4MB), run=20280-20295msec > > Known issue: > > DIF (PI) emulation doesn't work when a target uses async I/O, because > DIF metadata is saved in a separate file, and it is another non-trivial > task how to synchronize writing in two files, so that a following read > operation always returns a consisten metadata for a specified block. > > Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> > Signed-off-by: Andrei Vagin <avagin@openvz.org> > --- > drivers/target/target_core_file.c | 124 ++++++++++++++++++++++++++++++++++++-- > drivers/target/target_core_file.h | 1 + > 2 files changed, 120 insertions(+), 5 deletions(-) > > Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Patch looks good to me - Thanks for the performance enhancement! Btw I have been running I/O tests with HTX against this patch for 24 hrs and have no problems. -Bryant
On Thu, Mar 15, 2018 at 09:26:57AM -0500, Bryant G. Ly wrote: > On 3/8/18 6:42 PM, Andrei Vagin wrote: > > > Direct I/O allows to not affect the write-back cache, this is > > expected when a non-buffered mode is used. > > > > Async I/O allows to handle a few commands concurrently, so a target shows a > > better perfomance: > > > > Mode: O_DSYNC Async: 1 > > $ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sda --runtime=20 --numjobs=2 > > WRITE: bw=45.9MiB/s (48.1MB/s), 21.9MiB/s-23.0MiB/s (22.0MB/s-25.2MB/s), io=919MiB (963MB), run=20002-20020msec > > > > Mode: O_DSYNC Async: 0 > > $ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sdb --runtime=20 --numjobs=2 > > WRITE: bw=1607KiB/s (1645kB/s), 802KiB/s-805KiB/s (821kB/s-824kB/s), io=31.8MiB (33.4MB), run=20280-20295msec > > > > Known issue: > > > > DIF (PI) emulation doesn't work when a target uses async I/O, because > > DIF metadata is saved in a separate file, and it is another non-trivial > > task how to synchronize writing in two files, so that a following read > > operation always returns a consisten metadata for a specified block. > > > > Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> > > Signed-off-by: Andrei Vagin <avagin@openvz.org> > > --- > > drivers/target/target_core_file.c | 124 ++++++++++++++++++++++++++++++++++++-- > > drivers/target/target_core_file.h | 1 + > > 2 files changed, 120 insertions(+), 5 deletions(-) > > > > > Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> > > Patch looks good to me - Thanks for the performance enhancement! > > Btw I have been running I/O tests with HTX against this patch for 24 hrs and have no problems. Bryant, thank you for the feedback. > > -Bryant >
> DIF (PI) emulation doesn't work when a target uses async I/O, because > DIF metadata is saved in a separate file, and it is another non-trivial > task how to synchronize writing in two files, so that a following read > operation always returns a consisten metadata for a specified block. There literally is no way to do that, even without aio. The file DIF implementation should probably regarded as an early bringup / prototype tool, not something really usable. > +static void cmd_rw_aio_do_completion(struct target_core_file_cmd *cmd) > +{ > + if (!atomic_dec_and_test(&cmd->ref)) > + return; There is no need for reference counting. If the read_iter/write iter method returns -EIOCBQUEUED the completion callback needs to complete the I/O and free the structure, else the method caller. > + if (!(fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE)) > + aio_cmd->iocb.ki_flags |= IOCB_DIRECT; aio without IOCB_DIRECT doesn't make any sense. But the WCE flag really has nothing to do with buffers vs direct I/O anyway. > + if (is_write) > + ret = call_write_iter(file, &aio_cmd->iocb, &iter); > + else > + ret = call_read_iter(file, &aio_cmd->iocb, &iter); Please call the methods directly instead of through the wrappers. > + > static int fd_do_rw(struct se_cmd *cmd, struct file *fd, > u32 block_size, struct scatterlist *sgl, > u32 sgl_nents, u32 data_length, int is_write) > @@ -536,6 +626,7 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, > struct file *pfile = fd_dev->fd_prot_file; > sense_reason_t rc; > int ret = 0; > + int aio = fd_dev->fbd_flags & FDBD_HAS_ASYNC_IO; > /* > * We are currently limited by the number of iovecs (2048) per > * single vfs_[writev,readv] call. > @@ -550,7 +641,11 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, > * Call vectorized fileio functions to map struct scatterlist > * physical memory addresses to struct iovec virtual memory. > */ > - if (data_direction == DMA_FROM_DEVICE) { > + if (aio) { fd_execute_rw shares basically no code with the aio case. I'd rather have a very high level wrapper here: static sense_reason_t fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, enum dma_data_direction data_direction) { if (FD_DEV(cmd->se_dev)->fbd_flags & FDBD_HAS_ASYNC_IO) return fd_execute_rw_aio(cmd, sgl, sgl_nents, dma_direction); return fd_execute_rw_buffered(cmd, sgl, sgl_nents, dma_direction); } and keep the code separate.
Hi Christoph, Thank you for the review. All comments look reasonable. I will fix and set a final version soon. Pls, answer on one inline question. On Fri, Mar 16, 2018 at 12:50:27AM -0700, Christoph Hellwig wrote: > > DIF (PI) emulation doesn't work when a target uses async I/O, because > > DIF metadata is saved in a separate file, and it is another non-trivial > > task how to synchronize writing in two files, so that a following read > > operation always returns a consisten metadata for a specified block. > > There literally is no way to do that, even without aio. The file > DIF implementation should probably regarded as an early bringup / > prototype tool, not something really usable. > > > +static void cmd_rw_aio_do_completion(struct target_core_file_cmd *cmd) > > +{ > > + if (!atomic_dec_and_test(&cmd->ref)) > > + return; > > There is no need for reference counting. If the read_iter/write iter > method returns -EIOCBQUEUED the completion callback needs to complete > the I/O and free the structure, else the method caller. > > > + if (!(fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE)) > > + aio_cmd->iocb.ki_flags |= IOCB_DIRECT; > > aio without IOCB_DIRECT doesn't make any sense. But the WCE flag > really has nothing to do with buffers vs direct I/O anyway. > > > + if (is_write) > > + ret = call_write_iter(file, &aio_cmd->iocb, &iter); > > + else > > + ret = call_read_iter(file, &aio_cmd->iocb, &iter); > > Please call the methods directly instead of through the wrappers. Do you mean to call file->f_op->write_iter(kio, iter) instead of call_write_iter()? What is wrong with these wrappers? Thanks, Andrei > > > + > > static int fd_do_rw(struct se_cmd *cmd, struct file *fd, > > u32 block_size, struct scatterlist *sgl, > > u32 sgl_nents, u32 data_length, int is_write) > > @@ -536,6 +626,7 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, > > struct file *pfile = fd_dev->fd_prot_file; > > sense_reason_t rc; > > int ret = 0; > > + int aio = fd_dev->fbd_flags & FDBD_HAS_ASYNC_IO; > > /* > > * We are currently limited by the number of iovecs (2048) per > > * single vfs_[writev,readv] call. > > @@ -550,7 +641,11 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, > > * Call vectorized fileio functions to map struct scatterlist > > * physical memory addresses to struct iovec virtual memory. > > */ > > - if (data_direction == DMA_FROM_DEVICE) { > > + if (aio) { > > fd_execute_rw shares basically no code with the aio case. I'd rather > have a very high level wrapper here: > > static sense_reason_t > fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, > enum dma_data_direction data_direction) > { > if (FD_DEV(cmd->se_dev)->fbd_flags & FDBD_HAS_ASYNC_IO) > return fd_execute_rw_aio(cmd, sgl, sgl_nents, dma_direction); > return fd_execute_rw_buffered(cmd, sgl, sgl_nents, dma_direction); > } > > and keep the code separate. >
On Fri, Mar 16, 2018 at 05:13:25PM -0700, Andrei Vagin wrote: > > Please call the methods directly instead of through the wrappers. > > Do you mean to call file->f_op->write_iter(kio, iter) instead of > call_write_iter()? What is wrong with these wrappers? Yes. They are completely pointless and just obsfucate the code. I plan to remove them eventually.
On Fri, Mar 16, 2018 at 12:50:27AM -0700, Christoph Hellwig wrote: > > DIF (PI) emulation doesn't work when a target uses async I/O, because > > DIF metadata is saved in a separate file, and it is another non-trivial > > task how to synchronize writing in two files, so that a following read > > operation always returns a consisten metadata for a specified block. > > There literally is no way to do that, even without aio. The file > DIF implementation should probably regarded as an early bringup / > prototype tool, not something really usable. > > > +static void cmd_rw_aio_do_completion(struct target_core_file_cmd *cmd) > > +{ > > + if (!atomic_dec_and_test(&cmd->ref)) > > + return; > > There is no need for reference counting. If the read_iter/write iter > method returns -EIOCBQUEUED the completion callback needs to complete > the I/O and free the structure, else the method caller. I was near to send a final version, but I decided to investigate how a reference counter was appeared in drivers/block/loop.c: commit 92d773324b7edbd36bf0c28c1e0157763aeccc92 Author: Shaohua Li <shli@fb.com> Date: Fri Sep 1 11:15:17 2017 -0700 block/loop: fix use after free lo_rw_aio->call_read_iter-> 1 aops->direct_IO 2 iov_iter_revert lo_rw_aio_complete could happen between 1 and 2, the bio and bvec could be freed before 2, which accesses bvec. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> This commit looks reasonable, doesn't it? In out case, bvec-s are freed from the callback too. > > > + if (!(fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE)) > > + aio_cmd->iocb.ki_flags |= IOCB_DIRECT; > > aio without IOCB_DIRECT doesn't make any sense. But the WCE flag > really has nothing to do with buffers vs direct I/O anyway. > > > + if (is_write) > > + ret = call_write_iter(file, &aio_cmd->iocb, &iter); > > + else > > + ret = call_read_iter(file, &aio_cmd->iocb, &iter); > > Please call the methods directly instead of through the wrappers. > > > + > > static int fd_do_rw(struct se_cmd *cmd, struct file *fd, > > u32 block_size, struct scatterlist *sgl, > > u32 sgl_nents, u32 data_length, int is_write) > > @@ -536,6 +626,7 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, > > struct file *pfile = fd_dev->fd_prot_file; > > sense_reason_t rc; > > int ret = 0; > > + int aio = fd_dev->fbd_flags & FDBD_HAS_ASYNC_IO; > > /* > > * We are currently limited by the number of iovecs (2048) per > > * single vfs_[writev,readv] call. > > @@ -550,7 +641,11 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, > > * Call vectorized fileio functions to map struct scatterlist > > * physical memory addresses to struct iovec virtual memory. > > */ > > - if (data_direction == DMA_FROM_DEVICE) { > > + if (aio) { > > fd_execute_rw shares basically no code with the aio case. I'd rather > have a very high level wrapper here: > > static sense_reason_t > fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, > enum dma_data_direction data_direction) > { > if (FD_DEV(cmd->se_dev)->fbd_flags & FDBD_HAS_ASYNC_IO) > return fd_execute_rw_aio(cmd, sgl, sgl_nents, dma_direction); > return fd_execute_rw_buffered(cmd, sgl, sgl_nents, dma_direction); > } > > and keep the code separate. >
diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c index 9b2c0c773022..e8c07a0f7084 100644 --- a/drivers/target/target_core_file.c +++ b/drivers/target/target_core_file.c @@ -250,6 +250,96 @@ static void fd_destroy_device(struct se_device *dev) } } +struct target_core_file_cmd { + atomic_t ref; + unsigned long len; + long ret; + struct se_cmd *cmd; + struct kiocb iocb; + struct bio_vec bvec[0]; +}; + +static void cmd_rw_aio_do_completion(struct target_core_file_cmd *cmd) +{ + if (!atomic_dec_and_test(&cmd->ref)) + return; + + if (cmd->ret != cmd->len) + target_complete_cmd(cmd->cmd, SAM_STAT_CHECK_CONDITION); + else + target_complete_cmd(cmd->cmd, SAM_STAT_GOOD); + + kfree(cmd); +} + +static void cmd_rw_aio_complete(struct kiocb *iocb, long ret, long ret2) +{ + struct target_core_file_cmd *cmd; + + cmd = container_of(iocb, struct target_core_file_cmd, iocb); + + cmd->ret = ret; + cmd_rw_aio_do_completion(cmd); +} + +static int fd_do_aio_rw(struct se_cmd *cmd, struct fd_dev *fd_dev, + u32 block_size, struct scatterlist *sgl, + u32 sgl_nents, u32 data_length, int is_write) +{ + struct file *file = fd_dev->fd_file; + struct target_core_file_cmd *aio_cmd; + struct scatterlist *sg; + struct iov_iter iter = {}; + struct bio_vec *bvec; + ssize_t len = 0; + loff_t pos = (cmd->t_task_lba * block_size); + int ret = 0, i; + + aio_cmd = kmalloc(sizeof(struct target_core_file_cmd) + + sgl_nents * sizeof(struct bio_vec), + GFP_KERNEL | __GFP_ZERO); + if (!aio_cmd) + return -ENOMEM; + + bvec = aio_cmd->bvec; + + for_each_sg(sgl, sg, sgl_nents, i) { + bvec[i].bv_page = sg_page(sg); + bvec[i].bv_len = sg->length; + bvec[i].bv_offset = sg->offset; + + len += sg->length; + } + + iov_iter_bvec(&iter, ITER_BVEC | is_write, bvec, sgl_nents, len); + + atomic_set(&aio_cmd->ref, 2); + + aio_cmd->cmd = cmd; + aio_cmd->len = len; + aio_cmd->iocb.ki_pos = pos; + aio_cmd->iocb.ki_filp = file; + aio_cmd->iocb.ki_complete = cmd_rw_aio_complete; + aio_cmd->iocb.ki_flags = 0; + + if (!(fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE)) + aio_cmd->iocb.ki_flags |= IOCB_DIRECT; + if (is_write && (cmd->se_cmd_flags & SCF_FUA)) + aio_cmd->iocb.ki_flags |= IOCB_DSYNC; + + if (is_write) + ret = call_write_iter(file, &aio_cmd->iocb, &iter); + else + ret = call_read_iter(file, &aio_cmd->iocb, &iter); + + cmd_rw_aio_do_completion(aio_cmd); + + if (ret != -EIOCBQUEUED) + aio_cmd->iocb.ki_complete(&aio_cmd->iocb, ret, 0); + + return 0; +} + static int fd_do_rw(struct se_cmd *cmd, struct file *fd, u32 block_size, struct scatterlist *sgl, u32 sgl_nents, u32 data_length, int is_write) @@ -536,6 +626,7 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, struct file *pfile = fd_dev->fd_prot_file; sense_reason_t rc; int ret = 0; + int aio = fd_dev->fbd_flags & FDBD_HAS_ASYNC_IO; /* * We are currently limited by the number of iovecs (2048) per * single vfs_[writev,readv] call. @@ -550,7 +641,11 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, * Call vectorized fileio functions to map struct scatterlist * physical memory addresses to struct iovec virtual memory. */ - if (data_direction == DMA_FROM_DEVICE) { + if (aio) { + ret = fd_do_aio_rw(cmd, fd_dev, dev->dev_attrib.block_size, + sgl, sgl_nents, cmd->data_length, + !(data_direction == DMA_FROM_DEVICE)); + } else if (data_direction == DMA_FROM_DEVICE) { if (cmd->prot_type && dev->dev_attrib.pi_prot_type) { ret = fd_do_rw(cmd, pfile, dev->prot_length, cmd->t_prot_sg, cmd->t_prot_nents, @@ -616,18 +711,21 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, if (ret < 0) return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; - target_complete_cmd(cmd, SAM_STAT_GOOD); + if (!aio) + target_complete_cmd(cmd, SAM_STAT_GOOD); return 0; } enum { - Opt_fd_dev_name, Opt_fd_dev_size, Opt_fd_buffered_io, Opt_err + Opt_fd_dev_name, Opt_fd_dev_size, Opt_fd_buffered_io, + Opt_fd_async_io, Opt_err }; static match_table_t tokens = { {Opt_fd_dev_name, "fd_dev_name=%s"}, {Opt_fd_dev_size, "fd_dev_size=%s"}, {Opt_fd_buffered_io, "fd_buffered_io=%d"}, + {Opt_fd_async_io, "fd_async_io=%d"}, {Opt_err, NULL} }; @@ -693,6 +791,21 @@ static ssize_t fd_set_configfs_dev_params(struct se_device *dev, fd_dev->fbd_flags |= FDBD_HAS_BUFFERED_IO_WCE; break; + case Opt_fd_async_io: + ret = match_int(args, &arg); + if (ret) + goto out; + if (arg != 1) { + pr_err("bogus fd_async_io=%d value\n", arg); + ret = -EINVAL; + goto out; + } + + pr_debug("FILEIO: Using async I/O" + " operations for struct fd_dev\n"); + + fd_dev->fbd_flags |= FDBD_HAS_ASYNC_IO; + break; default: break; } @@ -709,10 +822,11 @@ static ssize_t fd_show_configfs_dev_params(struct se_device *dev, char *b) ssize_t bl = 0; bl = sprintf(b + bl, "TCM FILEIO ID: %u", fd_dev->fd_dev_id); - bl += sprintf(b + bl, " File: %s Size: %llu Mode: %s\n", + bl += sprintf(b + bl, " File: %s Size: %llu Mode: %s Async: %d\n", fd_dev->fd_dev_name, fd_dev->fd_dev_size, (fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE) ? - "Buffered-WCE" : "O_DSYNC"); + "Buffered-WCE" : "O_DSYNC", + !!(fd_dev->fbd_flags & FDBD_HAS_ASYNC_IO)); return bl; } diff --git a/drivers/target/target_core_file.h b/drivers/target/target_core_file.h index 53be5ffd3261..929b1ecd544e 100644 --- a/drivers/target/target_core_file.h +++ b/drivers/target/target_core_file.h @@ -22,6 +22,7 @@ #define FBDF_HAS_PATH 0x01 #define FBDF_HAS_SIZE 0x02 #define FDBD_HAS_BUFFERED_IO_WCE 0x04 +#define FDBD_HAS_ASYNC_IO 0x08 #define FDBD_FORMAT_UNIT_SIZE 2048 struct fd_dev {
Direct I/O allows to not affect the write-back cache, this is expected when a non-buffered mode is used. Async I/O allows to handle a few commands concurrently, so a target shows a better perfomance: Mode: O_DSYNC Async: 1 $ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sda --runtime=20 --numjobs=2 WRITE: bw=45.9MiB/s (48.1MB/s), 21.9MiB/s-23.0MiB/s (22.0MB/s-25.2MB/s), io=919MiB (963MB), run=20002-20020msec Mode: O_DSYNC Async: 0 $ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sdb --runtime=20 --numjobs=2 WRITE: bw=1607KiB/s (1645kB/s), 802KiB/s-805KiB/s (821kB/s-824kB/s), io=31.8MiB (33.4MB), run=20280-20295msec Known issue: DIF (PI) emulation doesn't work when a target uses async I/O, because DIF metadata is saved in a separate file, and it is another non-trivial task how to synchronize writing in two files, so that a following read operation always returns a consisten metadata for a specified block. Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> --- drivers/target/target_core_file.c | 124 ++++++++++++++++++++++++++++++++++++-- drivers/target/target_core_file.h | 1 + 2 files changed, 120 insertions(+), 5 deletions(-)