Message ID | 20241106122659.730712-5-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io_uring: support group buffer & ublk zc | expand |
On 11/6/24 5:26 AM, Ming Lei wrote: > Prepare for supporting kernel buffer in case of io group, in which group > leader leases kernel buffer to io_uring, and consumed by io_uring OPs. > > So reuse io_mapped_buf for group kernel buffer, and unfortunately > io_import_fixed() can't be reused since userspace fixed buffer is > virt-contiguous, but it isn't true for kernel buffer. > > Also kernel buffer lifetime is bound with group leader request, it isn't > necessary to use rsrc_node for tracking its lifetime, especially it needs > extra allocation of rsrc_node for each IO. While it isn't strictly necessary, I do think it'd clean up the io_kiocb parts and hopefully unify the assign and put path more. So I'd strongly suggest you do use an io_rsrc_node, even if it does just map the io_mapped_buf for this. > +struct io_mapped_buf { > + u64 start; > + unsigned int len; > + unsigned int nr_bvecs; > + > + /* kbuf hasn't refs and accounting, its lifetime is bound with req */ > + union { > + struct { > + refcount_t refs; > + unsigned int acct_pages; > + }; > + /* pbvec is only for kbuf */ > + const struct bio_vec *pbvec; > + }; > + unsigned int folio_shift:6; > + unsigned int dir:1; /* ITER_DEST or ITER_SOURCE */ > + unsigned int kbuf:1; /* kernel buffer or not */ > + /* offset in the 1st bvec, for kbuf only */ > + unsigned int offset; > + struct bio_vec bvec[] __counted_by(nr_bvecs); > +}; And then I'd get rid of this union, and have it follow the normal rules for an io_mapped_buf in that the refs are valid. Yes it'll take 8b more, but honestly I think unifying these bits and keeping it consistent is a LOT more important than saving a bit of space. This is imho the last piece missing to make this conform more nicely with how resource nodes are generally handled and used.
On Wed, Nov 06, 2024 at 08:15:13AM -0700, Jens Axboe wrote: > On 11/6/24 5:26 AM, Ming Lei wrote: > > Prepare for supporting kernel buffer in case of io group, in which group > > leader leases kernel buffer to io_uring, and consumed by io_uring OPs. > > > > So reuse io_mapped_buf for group kernel buffer, and unfortunately > > io_import_fixed() can't be reused since userspace fixed buffer is > > virt-contiguous, but it isn't true for kernel buffer. > > > > Also kernel buffer lifetime is bound with group leader request, it isn't > > necessary to use rsrc_node for tracking its lifetime, especially it needs > > extra allocation of rsrc_node for each IO. > > While it isn't strictly necessary, I do think it'd clean up the io_kiocb > parts and hopefully unify the assign and put path more. So I'd strongly > suggest you do use an io_rsrc_node, even if it does just map the > io_mapped_buf for this. Can you share your idea about how to unify buffer? I am also interested in this area, so I may take it into account in this patch. Will you plan to use io_rsrc_node for all buffer type(include buffer select)? > > > +struct io_mapped_buf { > > + u64 start; > > + unsigned int len; > > + unsigned int nr_bvecs; > > + > > + /* kbuf hasn't refs and accounting, its lifetime is bound with req */ > > + union { > > + struct { > > + refcount_t refs; > > + unsigned int acct_pages; > > + }; > > + /* pbvec is only for kbuf */ > > + const struct bio_vec *pbvec; > > + }; > > + unsigned int folio_shift:6; > > + unsigned int dir:1; /* ITER_DEST or ITER_SOURCE */ > > + unsigned int kbuf:1; /* kernel buffer or not */ > > + /* offset in the 1st bvec, for kbuf only */ > > + unsigned int offset; > > + struct bio_vec bvec[] __counted_by(nr_bvecs); > > +}; > > And then I'd get rid of this union, and have it follow the normal rules > for an io_mapped_buf in that the refs are valid. Yes it'll take 8b more, > but honestly I think unifying these bits and keeping it consistent is a > LOT more important than saving a bit of space. > > This is imho the last piece missing to make this conform more nicely > with how resource nodes are generally handled and used. OK. thanks, Ming
On 11/6/24 6:22 PM, Ming Lei wrote: > On Wed, Nov 06, 2024 at 08:15:13AM -0700, Jens Axboe wrote: >> On 11/6/24 5:26 AM, Ming Lei wrote: >>> Prepare for supporting kernel buffer in case of io group, in which group >>> leader leases kernel buffer to io_uring, and consumed by io_uring OPs. >>> >>> So reuse io_mapped_buf for group kernel buffer, and unfortunately >>> io_import_fixed() can't be reused since userspace fixed buffer is >>> virt-contiguous, but it isn't true for kernel buffer. >>> >>> Also kernel buffer lifetime is bound with group leader request, it isn't >>> necessary to use rsrc_node for tracking its lifetime, especially it needs >>> extra allocation of rsrc_node for each IO. >> >> While it isn't strictly necessary, I do think it'd clean up the io_kiocb >> parts and hopefully unify the assign and put path more. So I'd strongly >> suggest you do use an io_rsrc_node, even if it does just map the >> io_mapped_buf for this. > > Can you share your idea about how to unify buffer? I am also interested > in this area, so I may take it into account in this patch. I just mean use an io_rsrc_node rather than an io_mapped_buf. The node holds the buf, and then it should not need extra checking. Particularly with the callback, which I think needs to go in the io_rsrc_node. Hence I don't think you need to change much. Yes your mapping side will need to allocate an io_rsrc_node for this, but that's really not too bad. The benefits would be that the node assignment to an io_kiocb and putting of the node would follow normal registered buffers. > Will you plan to use io_rsrc_node for all buffer type(include buffer > select)? Probably not - the provided ring buffers are very low overhead and shared between userspace and the kernel, so it would not make much sense to shove an io_rsrc_node in between. But for anything else, yeah I'd love to see it just use the resource node.
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index bc883078c1ed..9af83cf214c2 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -2,6 +2,7 @@ #define IO_URING_TYPES_H #include <linux/blkdev.h> +#include <linux/bvec.h> #include <linux/hashtable.h> #include <linux/task_work.h> #include <linux/bitmap.h> @@ -39,6 +40,28 @@ enum io_uring_cmd_flags { IO_URING_F_COMPAT = (1 << 12), }; +struct io_mapped_buf { + u64 start; + unsigned int len; + unsigned int nr_bvecs; + + /* kbuf hasn't refs and accounting, its lifetime is bound with req */ + union { + struct { + refcount_t refs; + unsigned int acct_pages; + }; + /* pbvec is only for kbuf */ + const struct bio_vec *pbvec; + }; + unsigned int folio_shift:6; + unsigned int dir:1; /* ITER_DEST or ITER_SOURCE */ + unsigned int kbuf:1; /* kernel buffer or not */ + /* offset in the 1st bvec, for kbuf only */ + unsigned int offset; + struct bio_vec bvec[] __counted_by(nr_bvecs); +}; + struct io_wq_work_node { struct io_wq_work_node *next; }; diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index d407576ddfb7..c4a776860cb4 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -838,3 +838,37 @@ int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma) io_put_bl(ctx, bl); return ret; } + +/* + * kernel buffer is built over generic bvec, and can't be always + * virt-contiguous, which is different with userspace fixed buffer, + * so we can't reuse io_import_fixed() here + * + * Also kernel buffer lifetime is bound with request, and we needn't + * to use rsrc_node to track its lifetime + */ +int io_import_kbuf(int ddir, struct iov_iter *iter, + const struct io_mapped_buf *kbuf, + u64 buf_off, size_t len) +{ + unsigned long offset = kbuf->offset; + + WARN_ON_ONCE(!kbuf->kbuf); + + if (ddir != kbuf->dir) + return -EINVAL; + + if (unlikely(buf_off > kbuf->len)) + return -EFAULT; + + if (unlikely(len > kbuf->len - buf_off)) + return -EFAULT; + + offset += buf_off; + iov_iter_bvec(iter, ddir, kbuf->pbvec, kbuf->nr_bvecs, offset + len); + + if (offset) + iov_iter_advance(iter, offset); + + return 0; +} diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index 36aadfe5ac00..04ccd52dd0ad 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -88,6 +88,9 @@ void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl); struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx, unsigned long bgid); int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma); +int io_import_kbuf(int ddir, struct iov_iter *iter, + const struct io_mapped_buf *kbuf, + u64 buf_off, size_t len); static inline bool io_kbuf_recycle_ring(struct io_kiocb *req) { diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 16f5abe03d10..5f35641c55ab 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -771,6 +771,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, imu->len = iov->iov_len; imu->nr_bvecs = nr_pages; imu->folio_shift = PAGE_SHIFT; + imu->kbuf = 0; if (coalesced) imu->folio_shift = data.folio_shift; refcount_set(&imu->refs, 1); diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 255ec94ea172..d54a5f84b9ed 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -26,16 +26,6 @@ struct io_rsrc_node { }; }; -struct io_mapped_buf { - u64 start; - unsigned int len; - unsigned int nr_bvecs; - refcount_t refs; - unsigned int acct_pages; - unsigned int folio_shift:6; - struct bio_vec bvec[] __counted_by(nr_bvecs); -}; - struct io_imu_folio_data { /* Head folio can be partially included in the fixed buf */ unsigned int nr_pages_head;
Prepare for supporting kernel buffer in case of io group, in which group leader leases kernel buffer to io_uring, and consumed by io_uring OPs. So reuse io_mapped_buf for group kernel buffer, and unfortunately io_import_fixed() can't be reused since userspace fixed buffer is virt-contiguous, but it isn't true for kernel buffer. Also kernel buffer lifetime is bound with group leader request, it isn't necessary to use rsrc_node for tracking its lifetime, especially it needs extra allocation of rsrc_node for each IO. Signed-off-by: Ming Lei <ming.lei@redhat.com> --- include/linux/io_uring_types.h | 23 +++++++++++++++++++++++ io_uring/kbuf.c | 34 ++++++++++++++++++++++++++++++++++ io_uring/kbuf.h | 3 +++ io_uring/rsrc.c | 1 + io_uring/rsrc.h | 10 ---------- 5 files changed, 61 insertions(+), 10 deletions(-)