Message ID | a916f99aa91bc9066411015835cadd5677a454fb.1719190216.git.asml.silence@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | zerocopy tx cleanups | expand |
Pavel Begunkov wrote: > Instead of accounting every page range against the socket separately, do > it in batch based on the change in skb->truesize. It's also moved into > __zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is > simpler and responsible for setting frags but not the accounting. > > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> > --- > net/core/datagram.c | 31 ++++++++++++++++++------------- > 1 file changed, 18 insertions(+), 13 deletions(-) > > diff --git a/net/core/datagram.c b/net/core/datagram.c > index 7f7d5da2e406..2b24d69b1e94 100644 > --- a/net/core/datagram.c > +++ b/net/core/datagram.c > @@ -610,7 +610,7 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, > } > EXPORT_SYMBOL(skb_copy_datagram_from_iter); > > -static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb, > +static int zerocopy_fill_skb_from_iter(struct sk_buff *skb, > struct iov_iter *from, size_t length) > { > int frag = skb_shinfo(skb)->nr_frags; > @@ -621,7 +621,6 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb, > int refs, order, n = 0; > size_t start; > ssize_t copied; > - unsigned long truesize; > > if (frag == MAX_SKB_FRAGS) > return -EMSGSIZE; Does the existing code then incorrectly not unwind sk_wmem_queued_add and sk_mem_charge if returning with error from the second or later loop.. > @@ -633,17 +632,9 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb, > > length -= copied; > > - truesize = PAGE_ALIGN(copied + start); > skb->data_len += copied; > skb->len += copied; > - skb->truesize += truesize; > - if (sk && sk->sk_type == SOCK_STREAM) { > - sk_wmem_queued_add(sk, truesize); > - if (!skb_zcopy_pure(skb)) > - sk_mem_charge(sk, truesize); > - } else { > - refcount_add(truesize, &skb->sk->sk_wmem_alloc); > - } > + skb->truesize += PAGE_ALIGN(copied + start); > > head = compound_head(pages[n]); > order = compound_order(head); > @@ -691,10 +682,24 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, > struct sk_buff *skb, struct iov_iter *from, > size_t length) > { > + unsigned long orig_size = skb->truesize; > + unsigned long truesize; > + int ret; > + > if (msg && msg->msg_ubuf && msg->sg_from_iter) > return msg->sg_from_iter(sk, skb, from, length); > - else > - return zerocopy_fill_skb_from_iter(sk, skb, from, length); > + > + ret = zerocopy_fill_skb_from_iter(skb, from, length); > + truesize = skb->truesize - orig_size; > + > + if (sk && sk->sk_type == SOCK_STREAM) { > + sk_wmem_queued_add(sk, truesize); > + if (!skb_zcopy_pure(skb)) > + sk_mem_charge(sk, truesize); > + } else { > + refcount_add(truesize, &skb->sk->sk_wmem_alloc); > + } > + return ret; > } > EXPORT_SYMBOL(__zerocopy_sg_from_iter); > > -- > 2.44.0 >
On 6/28/24 18:06, Willem de Bruijn wrote: > Pavel Begunkov wrote: >> Instead of accounting every page range against the socket separately, do >> it in batch based on the change in skb->truesize. It's also moved into >> __zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is >> simpler and responsible for setting frags but not the accounting. >> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> > > Reviewed-by: Willem de Bruijn <willemb@google.com> Thanks for reviews! >> --- >> net/core/datagram.c | 31 ++++++++++++++++++------------- >> 1 file changed, 18 insertions(+), 13 deletions(-) >> >> diff --git a/net/core/datagram.c b/net/core/datagram.c >> index 7f7d5da2e406..2b24d69b1e94 100644 >> --- a/net/core/datagram.c >> +++ b/net/core/datagram.c >> @@ -610,7 +610,7 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, >> } >> EXPORT_SYMBOL(skb_copy_datagram_from_iter); >> >> -static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb, >> +static int zerocopy_fill_skb_from_iter(struct sk_buff *skb, >> struct iov_iter *from, size_t length) >> { >> int frag = skb_shinfo(skb)->nr_frags; >> @@ -621,7 +621,6 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb, >> int refs, order, n = 0; >> size_t start; >> ssize_t copied; >> - unsigned long truesize; >> >> if (frag == MAX_SKB_FRAGS) >> return -EMSGSIZE; > > Does the existing code then incorrectly not unwind sk_wmem_queued_add > and sk_mem_charge if returning with error from the second or later > loop.. As long as ->truesize matches what's accounted to the socket, kfree_skb() -> sock_wfree()/->destructor() should take care of it. With sk_mem_charge() I assume __zerocopy_sg_from_iter -> ___pskb_trim() should do it, need to look it up, but if not, it sounds like a temporary over estimation until the skb is put down. I don't see anything concerning. Is that the scenario you're worried about?
Pavel Begunkov wrote: > On 6/28/24 18:06, Willem de Bruijn wrote: > > Pavel Begunkov wrote: > >> Instead of accounting every page range against the socket separately, do > >> it in batch based on the change in skb->truesize. It's also moved into > >> __zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is > >> simpler and responsible for setting frags but not the accounting. > >> > >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> > > > > Reviewed-by: Willem de Bruijn <willemb@google.com> > > Thanks for reviews! > > >> --- > >> net/core/datagram.c | 31 ++++++++++++++++++------------- > >> 1 file changed, 18 insertions(+), 13 deletions(-) > >> > >> diff --git a/net/core/datagram.c b/net/core/datagram.c > >> index 7f7d5da2e406..2b24d69b1e94 100644 > >> --- a/net/core/datagram.c > >> +++ b/net/core/datagram.c > >> @@ -610,7 +610,7 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, > >> } > >> EXPORT_SYMBOL(skb_copy_datagram_from_iter); > >> > >> -static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb, > >> +static int zerocopy_fill_skb_from_iter(struct sk_buff *skb, > >> struct iov_iter *from, size_t length) > >> { > >> int frag = skb_shinfo(skb)->nr_frags; > >> @@ -621,7 +621,6 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb, > >> int refs, order, n = 0; > >> size_t start; > >> ssize_t copied; > >> - unsigned long truesize; > >> > >> if (frag == MAX_SKB_FRAGS) > >> return -EMSGSIZE; > > > > Does the existing code then incorrectly not unwind sk_wmem_queued_add > > and sk_mem_charge if returning with error from the second or later > > loop.. > > As long as ->truesize matches what's accounted to the socket, > kfree_skb() -> sock_wfree()/->destructor() should take care of it. > With sk_mem_charge() I assume __zerocopy_sg_from_iter -> ___pskb_trim() > should do it, need to look it up, but if not, it sounds like a temporary > over estimation until the skb is put down. I don't see anything > concerning. Is that the scenario you're worried about? Oh indeed. Thanks. I don't see ___pskb_trim adjusting except for the cases where it calls skb_condese, but neither does it adjust truesize. So indeed a temporary over estimation until e.g., tcp_wmem_free_skb. Sounds fine.
diff --git a/net/core/datagram.c b/net/core/datagram.c index 7f7d5da2e406..2b24d69b1e94 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -610,7 +610,7 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, } EXPORT_SYMBOL(skb_copy_datagram_from_iter); -static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb, +static int zerocopy_fill_skb_from_iter(struct sk_buff *skb, struct iov_iter *from, size_t length) { int frag = skb_shinfo(skb)->nr_frags; @@ -621,7 +621,6 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb, int refs, order, n = 0; size_t start; ssize_t copied; - unsigned long truesize; if (frag == MAX_SKB_FRAGS) return -EMSGSIZE; @@ -633,17 +632,9 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb, length -= copied; - truesize = PAGE_ALIGN(copied + start); skb->data_len += copied; skb->len += copied; - skb->truesize += truesize; - if (sk && sk->sk_type == SOCK_STREAM) { - sk_wmem_queued_add(sk, truesize); - if (!skb_zcopy_pure(skb)) - sk_mem_charge(sk, truesize); - } else { - refcount_add(truesize, &skb->sk->sk_wmem_alloc); - } + skb->truesize += PAGE_ALIGN(copied + start); head = compound_head(pages[n]); order = compound_order(head); @@ -691,10 +682,24 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { + unsigned long orig_size = skb->truesize; + unsigned long truesize; + int ret; + if (msg && msg->msg_ubuf && msg->sg_from_iter) return msg->sg_from_iter(sk, skb, from, length); - else - return zerocopy_fill_skb_from_iter(sk, skb, from, length); + + ret = zerocopy_fill_skb_from_iter(skb, from, length); + truesize = skb->truesize - orig_size; + + if (sk && sk->sk_type == SOCK_STREAM) { + sk_wmem_queued_add(sk, truesize); + if (!skb_zcopy_pure(skb)) + sk_mem_charge(sk, truesize); + } else { + refcount_add(truesize, &skb->sk->sk_wmem_alloc); + } + return ret; } EXPORT_SYMBOL(__zerocopy_sg_from_iter);
Instead of accounting every page range against the socket separately, do it in batch based on the change in skb->truesize. It's also moved into __zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is simpler and responsible for setting frags but not the accounting. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- net/core/datagram.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-)