Message ID | 5143111391e771dc97237e2a5e6a74223ef8f15f.1653992701.git.asml.silence@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io_uring zerocopy send | expand |
On Tue, Jun 28, 2022 at 07:56:27PM +0100, Pavel Begunkov wrote: > Add an bvec specialised and optimised path in zerocopy_sg_from_iter. > It'll be used later for {get,put}_page() optimisations. If you need a variant that would not grab page references for ITER_BVEC (and presumably other non-userland ones), the natural thing to do would be to provide just such a primitive, wouldn't it? The fun question here is by which paths ITER_BVEC can be passed to that function and which all of them are currently guaranteed to hold the underlying pages pinned... And AFAICS you quietly assume that only ITER_BVEC ones will ever have that "managed" flag of your set. Or am I misreading the next patch in the series?
On 6/28/22 21:06, Al Viro wrote: > On Tue, Jun 28, 2022 at 07:56:27PM +0100, Pavel Begunkov wrote: >> Add an bvec specialised and optimised path in zerocopy_sg_from_iter. >> It'll be used later for {get,put}_page() optimisations. > > If you need a variant that would not grab page references for ITER_BVEC > (and presumably other non-userland ones), the natural thing to do would I don't see other iter types interesting in this context > be to provide just such a primitive, wouldn't it? A helper returning a page array sounds like overshot and waste of cycles considering that it copies one bvec into another, and especially since iov_iter_get_pages() parses only the first struct bio_vec and so returns only 1 page at a time. I can actually use for_each_bvec(), but still leaves updating the iter from bvec_iter. > The fun question here is by which paths ITER_BVEC can be passed to that > function and which all of them are currently guaranteed to hold the > underlying pages pinned... It's the other way around, not all ITER_BVEC are managed but all users asking to use managed frags (i.e. io_uring) should keep pages pinned and provide ITER_BVEC. It's opt-in, both for users and protocols. --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -66,9 +66,16 @@ struct msghdr { }; bool msg_control_is_user : 1; bool msg_get_inq : 1;/* return INQ after receive */ + /* + * The data pages are pinned and won't be released before ->msg_ubuf + * is released. ->msg_iter should point to a bvec and ->msg_ubuf has + * to be non-NULL. + */ + bool msg_managed_data : 1; unsigned int msg_flags; /* flags on received message */ __kernel_size_t msg_controllen; /* ancillary data buffer length */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ + struct ubuf_info *msg_ubuf; }; The user sets ->msg_managed_data, then protocols find it and set SKBFL_MANAGED_FRAG_REFS. If either of the steps didn't happen the feature is not used. The ->msg_managed_data part makes io_uring the only user, and io_uring ensures pages are pinned. > And AFAICS you quietly assume that only ITER_BVEC ones will ever have that > "managed" flag of your set. Or am I misreading the next patch in the > series? I hope a comment just above ->msg_managed_data should count as not quiet.
On Tue, Jun 28, 2022 at 07:56:27PM +0100, Pavel Begunkov wrote: > Add an bvec specialised and optimised path in zerocopy_sg_from_iter. > It'll be used later for {get,put}_page() optimisations. > > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> > --- > net/core/datagram.c | 47 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 47 insertions(+) > Rather than propagating iter functions, I have been using the attached patch for a few months now. It leverages your ubuf_info in msghdr to allow in kernel users to pass in their own iter handler.
On 6/28/22 23:52, David Ahern wrote: > On Tue, Jun 28, 2022 at 07:56:27PM +0100, Pavel Begunkov wrote: >> Add an bvec specialised and optimised path in zerocopy_sg_from_iter. >> It'll be used later for {get,put}_page() optimisations. >> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >> --- >> net/core/datagram.c | 47 +++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 47 insertions(+) >> > > Rather than propagating iter functions, I have been using the attached > patch for a few months now. It leverages your ubuf_info in msghdr to > allow in kernel users to pass in their own iter handler. If the series is going to be picked up for 5.20, how about we delay this one for 5.21? I'll have time to think about it (maybe moving the skb managed flag setup inside?), and will anyway need to send some omitted patches then. I was also entertaining the idea of having a smaller ubuf_info to fit it into io_kiocb, which is tight on space. struct ubuf_info { void *callback; refcount_t refcnt; u32 flags; }; struct ubuf_info_msgzerocopy { struct ubuf_info ubuf; /* others fields */ }; 48 bytes would be taking too much, but 16 looks nice. It might make sense to move the callback into struct msghdr, I don't know.
On 7/4/22 7:31 AM, Pavel Begunkov wrote: > If the series is going to be picked up for 5.20, how about we delay > this one for 5.21? I'll have time to think about it (maybe moving > the skb managed flag setup inside?), and will anyway need to send > some omitted patches then. > I think it reads better for io_uring and future extensions for io_uring to contain the optimized bvec iter handler and setting the managed flag. Too many disjointed assumptions the way the code is now. By pulling that into io_uring, core code does not make assumptions that "managed" means bvec and no page references - rather that is embedded in the code that cares.
On 7/5/22 03:28, David Ahern wrote: > On 7/4/22 7:31 AM, Pavel Begunkov wrote: >> If the series is going to be picked up for 5.20, how about we delay >> this one for 5.21? I'll have time to think about it (maybe moving >> the skb managed flag setup inside?), and will anyway need to send >> some omitted patches then. >> > > I think it reads better for io_uring and future extensions for io_uring > to contain the optimized bvec iter handler and setting the managed flag. > Too many disjointed assumptions the way the code is now. By pulling that > into io_uring, core code does not make assumptions that "managed" means > bvec and no page references - rather that is embedded in the code that > cares. Core code would still need to know when to remove the skb's managed flag, e.g. in case of mixing. Can be worked out but with assumptions, which doesn't look better that it currently is. I'll post a 5.20 rebased version and will iron it out on the way then.
On 7/5/22 15:03, Pavel Begunkov wrote: > On 7/5/22 03:28, David Ahern wrote: >> On 7/4/22 7:31 AM, Pavel Begunkov wrote: >>> If the series is going to be picked up for 5.20, how about we delay >>> this one for 5.21? I'll have time to think about it (maybe moving >>> the skb managed flag setup inside?), and will anyway need to send >>> some omitted patches then. >>> >> >> I think it reads better for io_uring and future extensions for io_uring >> to contain the optimized bvec iter handler and setting the managed flag. >> Too many disjointed assumptions the way the code is now. By pulling that >> into io_uring, core code does not make assumptions that "managed" means >> bvec and no page references - rather that is embedded in the code that >> cares. > > Core code would still need to know when to remove the skb's managed > flag, e.g. in case of mixing. Can be worked out but with assumptions, > which doesn't look better that it currently is. I'll post a 5.20 > rebased version and will iron it out on the way then. Incremental looks like below. Probably looks better. What is slightly dubious is that for zerocopy paths it leaves downgrading managed bit to the callback unlike in most other places where it's done by core. Also knowing upfront whether the user requests the feature or not sounds less convoluted, but I guess it's not that important for now. I can try to rebase and see how it goes diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2d5badd4b9ff..2cc5b8850cb4 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1782,12 +1782,13 @@ void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, bool success); int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, - struct iov_iter *from, size_t length); + struct iov_iter *from, struct msghdr *msg, + size_t length); static inline int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len) { - return __zerocopy_sg_from_iter(skb->sk, skb, &msg->msg_iter, len); + return __zerocopy_sg_from_iter(skb->sk, skb, &msg->msg_iter, msg, len); } int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, diff --git a/include/linux/socket.h b/include/linux/socket.h index ba84ee614d5a..59b0f47c1f5a 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -14,6 +14,8 @@ struct file; struct pid; struct cred; struct socket; +struct sock; +struct sk_buff; #define __sockaddr_check_size(size) \ BUILD_BUG_ON(((size) > sizeof(struct __kernel_sockaddr_storage))) @@ -66,16 +68,13 @@ struct msghdr { }; bool msg_control_is_user : 1; bool msg_get_inq : 1;/* return INQ after receive */ - /* - * The data pages are pinned and won't be released before ->msg_ubuf - * is released. ->msg_iter should point to a bvec and ->msg_ubuf has - * to be non-NULL. - */ - bool msg_managed_data : 1; unsigned int msg_flags; /* flags on received message */ __kernel_size_t msg_controllen; /* ancillary data buffer length */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ struct ubuf_info *msg_ubuf; + + int (*sg_from_iter)(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length); }; struct user_msghdr { diff --git a/io_uring/net.c b/io_uring/net.c index a142a609790d..b7643f267e20 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -269,7 +269,6 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) msg.msg_controllen = 0; msg.msg_namelen = 0; msg.msg_ubuf = NULL; - msg.msg_managed_data = false; flags = sr->msg_flags; if (issue_flags & IO_URING_F_NONBLOCK) @@ -617,7 +616,6 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) msg.msg_controllen = 0; msg.msg_iocb = NULL; msg.msg_ubuf = NULL; - msg.msg_managed_data = false; flags = sr->msg_flags; if (force_nonblock) @@ -706,6 +704,60 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return 0; } +static int io_sg_from_iter(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length) +{ + struct skb_shared_info *shinfo = skb_shinfo(skb); + int frag = shinfo->nr_frags; + int ret = 0; + struct bvec_iter bi; + ssize_t copied = 0; + unsigned long truesize = 0; + + if (!shinfo->nr_frags) + shinfo->flags |= SKBFL_MANAGED_FRAG_REFS; + + if (!skb_zcopy_managed(skb) || !iov_iter_is_bvec(from)) { + skb_zcopy_downgrade_managed(skb); + return __zerocopy_sg_from_iter(sk, skb, from, NULL, length); + } + + bi.bi_size = min(from->count, length); + bi.bi_bvec_done = from->iov_offset; + bi.bi_idx = 0; + + while (bi.bi_size && frag < MAX_SKB_FRAGS) { + struct bio_vec v = mp_bvec_iter_bvec(from->bvec, bi); + + copied += v.bv_len; + truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); + __skb_fill_page_desc_noacc(shinfo, frag++, v.bv_page, + v.bv_offset, v.bv_len); + bvec_iter_advance_single(from->bvec, &bi, v.bv_len); + } + if (bi.bi_size) + ret = -EMSGSIZE; + + shinfo->nr_frags = frag; + from->bvec += bi.bi_idx; + from->nr_segs -= bi.bi_idx; + from->count = bi.bi_size; + from->iov_offset = bi.bi_bvec_done; + + skb->data_len += copied; + skb->len += copied; + skb->truesize += truesize; + + if (sk && sk->sk_type == SOCK_STREAM) { + sk_wmem_queued_add(sk, truesize); + if (!skb_zcopy_pure(skb)) + sk_mem_charge(sk, truesize); + } else { + refcount_add(truesize, &skb->sk->sk_wmem_alloc); + } + return ret; +} + int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) { struct sockaddr_storage address; @@ -740,7 +792,7 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; - msg.msg_managed_data = 1; + msg.sg_from_iter = io_sg_from_iter; if (zc->flags & IORING_RECVSEND_FIXED_BUF) { ret = io_import_fixed(WRITE, &msg.msg_iter, req->imu, @@ -748,7 +800,6 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(ret)) return ret; } else { - msg.msg_managed_data = 0; ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); if (unlikely(ret)) diff --git a/net/compat.c b/net/compat.c index 435846fa85e0..6cd2e7683dd0 100644 --- a/net/compat.c +++ b/net/compat.c @@ -81,7 +81,6 @@ int __get_compat_msghdr(struct msghdr *kmsg, kmsg->msg_iocb = NULL; kmsg->msg_ubuf = NULL; - kmsg->msg_managed_data = false; *ptr = msg.msg_iov; *len = msg.msg_iovlen; return 0; diff --git a/net/core/datagram.c b/net/core/datagram.c index 3c913a6342ad..6901dcb44d72 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -613,59 +613,14 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, } EXPORT_SYMBOL(skb_copy_datagram_from_iter); -static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, - struct iov_iter *from, size_t length) -{ - struct skb_shared_info *shinfo = skb_shinfo(skb); - int frag = shinfo->nr_frags; - int ret = 0; - struct bvec_iter bi; - ssize_t copied = 0; - unsigned long truesize = 0; - - bi.bi_size = min(from->count, length); - bi.bi_bvec_done = from->iov_offset; - bi.bi_idx = 0; - - while (bi.bi_size && frag < MAX_SKB_FRAGS) { - struct bio_vec v = mp_bvec_iter_bvec(from->bvec, bi); - - copied += v.bv_len; - truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); - __skb_fill_page_desc_noacc(shinfo, frag++, v.bv_page, - v.bv_offset, v.bv_len); - bvec_iter_advance_single(from->bvec, &bi, v.bv_len); - } - if (bi.bi_size) - ret = -EMSGSIZE; - - shinfo->nr_frags = frag; - from->bvec += bi.bi_idx; - from->nr_segs -= bi.bi_idx; - from->count = bi.bi_size; - from->iov_offset = bi.bi_bvec_done; - - skb->data_len += copied; - skb->len += copied; - skb->truesize += truesize; - - if (sk && sk->sk_type == SOCK_STREAM) { - sk_wmem_queued_add(sk, truesize); - if (!skb_zcopy_pure(skb)) - sk_mem_charge(sk, truesize); - } else { - refcount_add(truesize, &skb->sk->sk_wmem_alloc); - } - return ret; -} - int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, - struct iov_iter *from, size_t length) + struct iov_iter *from, struct msghdr *msg, + size_t length) { int frag; - if (skb_zcopy_managed(skb)) - return __zerocopy_sg_from_bvec(sk, skb, from, length); + if (unlikely(msg && msg->msg_ubuf && msg->sg_from_iter)) + return msg->sg_from_iter(sk, skb, from, length); frag = skb_shinfo(skb)->nr_frags; @@ -753,7 +708,7 @@ int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *from) if (skb_copy_datagram_from_iter(skb, 0, from, copy)) return -EFAULT; - return __zerocopy_sg_from_iter(NULL, skb, from, ~0U); + return __zerocopy_sg_from_iter(NULL, skb, from, NULL, ~0U); } EXPORT_SYMBOL(zerocopy_sg_from_iter); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7e6fcb3cd817..046ec3124835 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1368,7 +1368,7 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, if (orig_uarg && uarg != orig_uarg) return -EEXIST; - err = __zerocopy_sg_from_iter(sk, skb, &msg->msg_iter, len); + err = __zerocopy_sg_from_iter(sk, skb, &msg->msg_iter, msg, len); if (err == -EFAULT || (err == -EMSGSIZE && skb->len == orig_len)) { struct sock *save_sk = skb->sk; diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 3fd1bf675598..df7f9dfbe8be 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1241,18 +1241,7 @@ static int __ip_append_data(struct sock *sk, skb->truesize += copy; wmem_alloc_delta += copy; } else { - struct msghdr *msg = from; - - if (!skb_shinfo(skb)->nr_frags) { - if (msg->msg_managed_data) - skb_shinfo(skb)->flags |= SKBFL_MANAGED_FRAG_REFS; - } else { - /* appending, don't mix managed and unmanaged */ - if (!msg->msg_managed_data) - skb_zcopy_downgrade_managed(skb); - } - - err = skb_zerocopy_iter_dgram(skb, msg, copy); + err = skb_zerocopy_iter_dgram(skb, from, copy); if (err < 0) goto error; } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 05e2f6271f65..634c16fe8dcd 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1392,18 +1392,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) * zerocopy skb */ if (!skb->len) { - if (msg->msg_managed_data) - skb_shinfo(skb)->flags |= SKBFL_MANAGED_FRAG_REFS; skb_shinfo(skb)->flags |= SKBFL_PURE_ZEROCOPY; - } else { - /* appending, don't mix managed and unmanaged */ - if (!msg->msg_managed_data) - skb_zcopy_downgrade_managed(skb); - if (!skb_zcopy_pure(skb)) { - copy = tcp_wmem_schedule(sk, copy); - if (!copy) - goto wait_for_space; - } + } else if (!skb_zcopy_pure(skb)) { + copy = tcp_wmem_schedule(sk, copy); + if (!copy) + goto wait_for_space; } err = skb_zerocopy_iter_stream(sk, skb, msg, copy, uarg); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 34eb3b5da5e2..897ca4f9b791 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1796,18 +1796,7 @@ static int __ip6_append_data(struct sock *sk, skb->truesize += copy; wmem_alloc_delta += copy; } else { - struct msghdr *msg = from; - - if (!skb_shinfo(skb)->nr_frags) { - if (msg->msg_managed_data) - skb_shinfo(skb)->flags |= SKBFL_MANAGED_FRAG_REFS; - } else { - /* appending, don't mix managed and unmanaged */ - if (!msg->msg_managed_data) - skb_zcopy_downgrade_managed(skb); - } - - err = skb_zerocopy_iter_dgram(skb, msg, copy); + err = skb_zerocopy_iter_dgram(skb, from, copy); if (err < 0) goto error; } diff --git a/net/socket.c b/net/socket.c index 0963a02b1472..ed061609265e 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2107,7 +2107,6 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, msg.msg_controllen = 0; msg.msg_namelen = 0; msg.msg_ubuf = NULL; - msg.msg_managed_data = false; if (addr) { err = move_addr_to_kernel(addr, addr_len, &address); if (err < 0) @@ -2174,7 +2173,6 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, msg.msg_iocb = NULL; msg.msg_flags = 0; msg.msg_ubuf = NULL; - msg.msg_managed_data = false; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; err = sock_recvmsg(sock, &msg, flags); @@ -2414,7 +2412,6 @@ int __copy_msghdr_from_user(struct msghdr *kmsg, kmsg->msg_iocb = NULL; kmsg->msg_ubuf = NULL; - kmsg->msg_managed_data = false; *uiov = msg.msg_iov; *nsegs = msg.msg_iovlen; return 0;
On 7/5/22 4:09 PM, Pavel Begunkov wrote: > On 7/5/22 15:03, Pavel Begunkov wrote: >> On 7/5/22 03:28, David Ahern wrote: >>> On 7/4/22 7:31 AM, Pavel Begunkov wrote: >>>> If the series is going to be picked up for 5.20, how about we delay >>>> this one for 5.21? I'll have time to think about it (maybe moving >>>> the skb managed flag setup inside?), and will anyway need to send >>>> some omitted patches then. >>>> >>> >>> I think it reads better for io_uring and future extensions for io_uring >>> to contain the optimized bvec iter handler and setting the managed flag. >>> Too many disjointed assumptions the way the code is now. By pulling that >>> into io_uring, core code does not make assumptions that "managed" means >>> bvec and no page references - rather that is embedded in the code that >>> cares. >> >> Core code would still need to know when to remove the skb's managed >> flag, e.g. in case of mixing. Can be worked out but with assumptions, >> which doesn't look better that it currently is. I'll post a 5.20 >> rebased version and will iron it out on the way then. Sure. My comment was that MANAGED means something else (not core code) manages the page references on the skb frags. That flag does not need to be linked to a customized bvec. > @@ -66,16 +68,13 @@ struct msghdr { > }; > bool msg_control_is_user : 1; > bool msg_get_inq : 1;/* return INQ after receive */ > - /* > - * The data pages are pinned and won't be released before ->msg_ubuf > - * is released. ->msg_iter should point to a bvec and ->msg_ubuf has > - * to be non-NULL. > - */ > - bool msg_managed_data : 1; > unsigned int msg_flags; /* flags on received message */ > __kernel_size_t msg_controllen; /* ancillary data buffer > length */ > struct kiocb *msg_iocb; /* ptr to iocb for async requests */ > struct ubuf_info *msg_ubuf; > + > + int (*sg_from_iter)(struct sock *sk, struct sk_buff *skb, > + struct iov_iter *from, size_t length); > }; > Putting in msghdr works too. I chose ubuf_info because it is directly related to the ZC path, but that struct is getting tight on space.
diff --git a/net/core/datagram.c b/net/core/datagram.c index 50f4faeea76c..5237cb533bb4 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -613,11 +613,58 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, } EXPORT_SYMBOL(skb_copy_datagram_from_iter); +static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length) +{ + int frag = skb_shinfo(skb)->nr_frags; + int ret = 0; + struct bvec_iter bi; + ssize_t copied = 0; + unsigned long truesize = 0; + + bi.bi_size = min(from->count, length); + bi.bi_bvec_done = from->iov_offset; + bi.bi_idx = 0; + + while (bi.bi_size && frag < MAX_SKB_FRAGS) { + struct bio_vec v = mp_bvec_iter_bvec(from->bvec, bi); + + copied += v.bv_len; + truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); + get_page(v.bv_page); + skb_fill_page_desc(skb, frag++, v.bv_page, v.bv_offset, v.bv_len); + bvec_iter_advance_single(from->bvec, &bi, v.bv_len); + } + if (bi.bi_size) + ret = -EMSGSIZE; + + from->bvec += bi.bi_idx; + from->nr_segs -= bi.bi_idx; + from->count = bi.bi_size; + from->iov_offset = bi.bi_bvec_done; + + skb->data_len += copied; + skb->len += copied; + skb->truesize += truesize; + + if (sk && sk->sk_type == SOCK_STREAM) { + sk_wmem_queued_add(sk, truesize); + if (!skb_zcopy_pure(skb)) + sk_mem_charge(sk, truesize); + } else { + refcount_add(truesize, &skb->sk->sk_wmem_alloc); + } + return ret; +} + int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { int frag = skb_shinfo(skb)->nr_frags; + if (iov_iter_is_bvec(from)) + return __zerocopy_sg_from_bvec(sk, skb, from, length); + while (length && iov_iter_count(from)) { struct page *pages[MAX_SKB_FRAGS]; struct page *last_head = NULL;
Add an bvec specialised and optimised path in zerocopy_sg_from_iter. It'll be used later for {get,put}_page() optimisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- net/core/datagram.c | 47 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)