Message ID | 20200226083719.4389-2-bob.liu@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | userspace PI passthrough via io_uring | expand |
On 2/26/20 1:37 AM, Bob Liu wrote: > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h > index a3300e1..98fa3f1 100644 > --- a/include/uapi/linux/io_uring.h > +++ b/include/uapi/linux/io_uring.h > @@ -62,6 +62,8 @@ enum { > IORING_OP_NOP, > IORING_OP_READV, > IORING_OP_WRITEV, > + IORING_OP_READV_PI, > + IORING_OP_WRITEV_PI, > IORING_OP_FSYNC, > IORING_OP_READ_FIXED, > IORING_OP_WRITE_FIXED, So this one renumbers everything past IORING_OP_WRITEV, breaking the ABI in a very bad way. I'm guessing that was entirely unintentional? Any new command must go at the end of the list. You're also not updating io_op_defs[] with the two new commands, which means it won't compile at all. I'm guessing you tested this on an older version of the kernel which didn't have io_op_defs[]?
On Wed, Feb 26, 2020 at 07:24:06AM -0700, Jens Axboe wrote: > On 2/26/20 1:37 AM, Bob Liu wrote: > > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h > > index a3300e1..98fa3f1 100644 > > --- a/include/uapi/linux/io_uring.h > > +++ b/include/uapi/linux/io_uring.h > > @@ -62,6 +62,8 @@ enum { > > IORING_OP_NOP, > > IORING_OP_READV, > > IORING_OP_WRITEV, > > + IORING_OP_READV_PI, > > + IORING_OP_WRITEV_PI, > > IORING_OP_FSYNC, > > IORING_OP_READ_FIXED, > > IORING_OP_WRITE_FIXED, > > So this one renumbers everything past IORING_OP_WRITEV, breaking the > ABI in a very bad way. I'm guessing that was entirely unintentional? > Any new command must go at the end of the list. > > You're also not updating io_op_defs[] with the two new commands, > which means it won't compile at all. I'm guessing you tested this on > an older version of the kernel which didn't have io_op_defs[]? And the real question is why we need ops insted of just a flag and using previously reserved fields for the PI pointer.
On 2/26/20 8:57 AM, Christoph Hellwig wrote: > On Wed, Feb 26, 2020 at 07:24:06AM -0700, Jens Axboe wrote: >> On 2/26/20 1:37 AM, Bob Liu wrote: >>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h >>> index a3300e1..98fa3f1 100644 >>> --- a/include/uapi/linux/io_uring.h >>> +++ b/include/uapi/linux/io_uring.h >>> @@ -62,6 +62,8 @@ enum { >>> IORING_OP_NOP, >>> IORING_OP_READV, >>> IORING_OP_WRITEV, >>> + IORING_OP_READV_PI, >>> + IORING_OP_WRITEV_PI, >>> IORING_OP_FSYNC, >>> IORING_OP_READ_FIXED, >>> IORING_OP_WRITE_FIXED, >> >> So this one renumbers everything past IORING_OP_WRITEV, breaking the >> ABI in a very bad way. I'm guessing that was entirely unintentional? >> Any new command must go at the end of the list. >> >> You're also not updating io_op_defs[] with the two new commands, >> which means it won't compile at all. I'm guessing you tested this on >> an older version of the kernel which didn't have io_op_defs[]? > > And the real question is why we need ops insted of just a flag and > using previously reserved fields for the PI pointer. Yeah, should probably be a RWF_ flag instead, and a 64-bit SQE field for the PI data. The 'last iovec is PI' is kind of icky.
On Wed, Feb 26, 2020 at 08:58:46AM -0700, Jens Axboe wrote: > On 2/26/20 8:57 AM, Christoph Hellwig wrote: > > On Wed, Feb 26, 2020 at 07:24:06AM -0700, Jens Axboe wrote: > >> On 2/26/20 1:37 AM, Bob Liu wrote: > >>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h > >>> index a3300e1..98fa3f1 100644 > >>> --- a/include/uapi/linux/io_uring.h > >>> +++ b/include/uapi/linux/io_uring.h > >>> @@ -62,6 +62,8 @@ enum { > >>> IORING_OP_NOP, > >>> IORING_OP_READV, > >>> IORING_OP_WRITEV, > >>> + IORING_OP_READV_PI, > >>> + IORING_OP_WRITEV_PI, > >>> IORING_OP_FSYNC, > >>> IORING_OP_READ_FIXED, > >>> IORING_OP_WRITE_FIXED, > >> > >> So this one renumbers everything past IORING_OP_WRITEV, breaking the > >> ABI in a very bad way. I'm guessing that was entirely unintentional? > >> Any new command must go at the end of the list. > >> > >> You're also not updating io_op_defs[] with the two new commands, > >> which means it won't compile at all. I'm guessing you tested this on > >> an older version of the kernel which didn't have io_op_defs[]? > > > > And the real question is why we need ops insted of just a flag and > > using previously reserved fields for the PI pointer. > > Yeah, should probably be a RWF_ flag instead, and a 64-bit SQE field > for the PI data. The 'last iovec is PI' is kind of icky. Heh, I was about to send in nearly the same comment. :) --D > -- > Jens Axboe >
On Wed, Feb 26, 2020 at 08:58:46AM -0700, Jens Axboe wrote: > Yeah, should probably be a RWF_ flag instead, and a 64-bit SQE field > for the PI data. The 'last iovec is PI' is kind of icky. Abusing an iovec (although I though of the first once when looking into it) looks really horrible, but has two huge advantages: - it doesn't require passing another argument all the way down the I/O stack - it works with all the vectored interfaces that take a flag argument, so not just io_uring, but also preadv2/pwritev2 and aio. And while I don't care too much about the last I think preadv2 and pwritev2 are valuable to support.
On 2/26/20 10:24 PM, Jens Axboe wrote: > On 2/26/20 1:37 AM, Bob Liu wrote: >> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h >> index a3300e1..98fa3f1 100644 >> --- a/include/uapi/linux/io_uring.h >> +++ b/include/uapi/linux/io_uring.h >> @@ -62,6 +62,8 @@ enum { >> IORING_OP_NOP, >> IORING_OP_READV, >> IORING_OP_WRITEV, >> + IORING_OP_READV_PI, >> + IORING_OP_WRITEV_PI, >> IORING_OP_FSYNC, >> IORING_OP_READ_FIXED, >> IORING_OP_WRITE_FIXED, > > So this one renumbers everything past IORING_OP_WRITEV, breaking the > ABI in a very bad way. I'm guessing that was entirely unintentional? > Any new command must go at the end of the list. > > You're also not updating io_op_defs[] with the two new commands, > which means it won't compile at all. I'm guessing you tested this on > an older version of the kernel which didn't have io_op_defs[]? > Yep, will rebase to the latest version next time.
On 2/27/20 12:53 AM, Christoph Hellwig wrote: > On Wed, Feb 26, 2020 at 08:58:46AM -0700, Jens Axboe wrote: >> Yeah, should probably be a RWF_ flag instead, and a 64-bit SQE field >> for the PI data. The 'last iovec is PI' is kind of icky. > > Abusing an iovec (although I though of the first once when looking > into it) looks really horrible, but has two huge advantages: > > - it doesn't require passing another argument all the way down > the I/O stack > - it works with all the vectored interfaces that take a flag > argument, so not just io_uring, but also preadv2/pwritev2 and aio. > And while I don't care too much about the last I think preadv2 > and pwritev2 are valuable to support. > Indeed, actually the 'last iovec is PI' idea was learned from Darrick's original patch which support PI passthrough via aio. https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg27537.html
diff --git a/fs/io_uring.c b/fs/io_uring.c index 562e3a1..c43d96a 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -630,12 +630,14 @@ static inline bool io_prep_async_work(struct io_kiocb *req, switch (req->opcode) { case IORING_OP_WRITEV: + case IORING_OP_WRITEV_PI: case IORING_OP_WRITE_FIXED: /* only regular files should be hashed for writes */ if (req->flags & REQ_F_ISREG) do_hashed = true; /* fall-through */ case IORING_OP_READV: + case IORING_OP_READV_PI: case IORING_OP_READ_FIXED: case IORING_OP_SENDMSG: case IORING_OP_RECVMSG: @@ -1505,6 +1507,12 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, kiocb->ki_pos = READ_ONCE(sqe->off); kiocb->ki_flags = iocb_flags(kiocb->ki_filp); kiocb->ki_hint = ki_hint_validate(file_write_hint(kiocb->ki_filp)); + if (req->opcode == IORING_OP_READV_PI || + req->opcode == IORING_OP_WRITEV_PI) { + if (!(req->file->f_flags & O_DIRECT)) + return -EINVAL; + kiocb->ki_flags |= IOCB_USE_PI; + } ioprio = READ_ONCE(sqe->ioprio); if (ioprio) { @@ -3065,10 +3073,12 @@ static int io_req_defer_prep(struct io_kiocb *req, case IORING_OP_NOP: break; case IORING_OP_READV: + case IORING_OP_READV_PI: case IORING_OP_READ_FIXED: ret = io_read_prep(req, sqe, true); break; case IORING_OP_WRITEV: + case IORING_OP_WRITEV_PI: case IORING_OP_WRITE_FIXED: ret = io_write_prep(req, sqe, true); break; @@ -3156,6 +3166,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, case IORING_OP_NOP: ret = io_nop(req); break; + case IORING_OP_READV_PI: case IORING_OP_READV: case IORING_OP_READ_FIXED: if (sqe) { @@ -3166,6 +3177,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, ret = io_read(req, nxt, force_nonblock); break; case IORING_OP_WRITEV: + case IORING_OP_WRITEV_PI: case IORING_OP_WRITE_FIXED: if (sqe) { ret = io_write_prep(req, sqe, force_nonblock); diff --git a/include/linux/fs.h b/include/linux/fs.h index 98e0349..65fda07 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -314,6 +314,7 @@ enum rw_hint { #define IOCB_SYNC (1 << 5) #define IOCB_WRITE (1 << 6) #define IOCB_NOWAIT (1 << 7) +#define IOCB_USE_PI (1 << 8) struct kiocb { struct file *ki_filp; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index a3300e1..98fa3f1 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -62,6 +62,8 @@ enum { IORING_OP_NOP, IORING_OP_READV, IORING_OP_WRITEV, + IORING_OP_READV_PI, + IORING_OP_WRITEV_PI, IORING_OP_FSYNC, IORING_OP_READ_FIXED, IORING_OP_WRITE_FIXED,
Add read and write with protect information command. Signed-off-by: Bob Liu <bob.liu@oracle.com> --- fs/io_uring.c | 12 ++++++++++++ include/linux/fs.h | 1 + include/uapi/linux/io_uring.h | 2 ++ 3 files changed, 15 insertions(+)