mbox series

[v4,0/6] zone-append support in io-uring and aio

Message ID 1595605762-17010-1-git-send-email-joshi.k@samsung.com (mailing list archive)
Headers show
Series zone-append support in io-uring and aio | expand


Kanchan Joshi July 24, 2020, 3:49 p.m. UTC
Changes since v3:
- Return absolute append offset in bytes, in both io_uring and aio
- Repurpose cqe's res/flags and introduce res64 to send 64bit append-offset
- Change iov_iter_truncate to report whether it actually truncated
- Prevent short write and return failure if zone-append is spanning
beyond end-of-device
- Change ki_complete(...,long ret2) interface to support 64bit ret2

v3: https://lore.kernel.org/lkml/1593974870-18919-1-git-send-email-joshi.k@samsung.com/

Changes since v2:
- Use file append infra (O_APPEND/RWF_APPEND) to trigger zone-append
(Christoph, Wilcox)
- Added Block I/O path changes (Damien). Avoided append split into multi-bio.
- Added patch to extend zone-append in block-layer to support bvec iov_iter.
Append using io-uring fixed-buffer is enabled with this.
- Made io-uring support code more concise, added changes mentioned by Pavel.

v2: https://lore.kernel.org/io-uring/1593105349-19270-1-git-send-email-joshi.k@samsung.com/

Changes since v1:
- No new opcodes in uring or aio. Use RWF_ZONE_APPEND flag instead.
- linux-aio changes vanish because of no new opcode
- Fixed the overflow and other issues mentioned by Damien
- Simplified uring support code, fixed the issues mentioned by Pavel
- Added error checks for io-uring fixed-buffer and sync kiocb

v1: https://lore.kernel.org/io-uring/1592414619-5646-1-git-send-email-joshi.k@samsung.com/

Cover letter (updated):

This patchset enables zone-append using io-uring/linux-aio, on block IO path.
Purpose is to provide zone-append consumption ability to applications which are
using zoned-block-device directly.
Application can send write with existing O/RWF_APPEND;On a zoned-block-device
this will trigger zone-append. On regular block device, existing file-append
behavior is retained. However, infra allows zone-append to be triggered on
any file if FMODE_ZONE_APPEND (new kernel-only fmode) is set during open.

With zone-append, written-location within zone is known only after completion.
So apart from the usual return value of write, additional means are
needed to obtain the actual written-location.

In aio, 64bit append-offset is returned to application using res2
field of io_event -

struct io_event {
        __u64           data;           /* the data field from the iocb */
        __u64           obj;            /* what iocb this event came from */
        __s64           res;            /* result code for this event */
        __s64           res2;           /* secondary result */

In io-uring, [cqe->res, cqq->flags] repurposed into res64 to return
64bit append-offset to user-space.

struct io_uring_cqe {
        __u64   user_data;      /* sqe->data submission passed back */
        union {
                struct {
                        __s32   res;    /* result code for this event */
                        __u32   flags;
                __s64   res64;  /* appending offset for zone append */
Zone-append write is ensured not to be a short-write.

Kanchan Joshi (3):
  block: add zone append handling for direct I/O path
  block: enable zone-append for iov_iter of bvec type

SelvaKumar S (3):
  fs: change ki_complete interface to support 64bit ret2
  uio: return status with iov truncation
  io_uring: add support for zone-append

 block/bio.c                       | 31 ++++++++++++++++++++---
 drivers/block/loop.c              |  2 +-
 drivers/nvme/target/io-cmd-file.c |  2 +-
 drivers/target/target_core_file.c |  2 +-
 fs/aio.c                          |  2 +-
 fs/block_dev.c                    | 51 +++++++++++++++++++++++++++++--------
 fs/io_uring.c                     | 53 +++++++++++++++++++++++++++++++--------
 fs/overlayfs/file.c               |  2 +-
 include/linux/fs.h                | 16 +++++++++---
 include/linux/uio.h               |  7 ++++--
 include/uapi/linux/io_uring.h     |  9 +++++--
 11 files changed, 142 insertions(+), 35 deletions(-)