Message ID | 20231219210357.4029713-3-dw@davidwei.uk (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Zero copy Rx using io_uring | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply, async |
On Tue, Dec 19, 2023 at 1:04 PM David Wei <dw@davidwei.uk> wrote: > > From: Pavel Begunkov <asml.silence@gmail.com> > > NOT FOR UPSTREAM > > There will be more users of struct page_pool_iov, and ppiovs from one > subsystem must not be used by another. That should never happen for any > sane application, but we need to enforce it in case of bufs and/or > malicious users. > > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> > Signed-off-by: David Wei <dw@davidwei.uk> > --- > net/ipv4/tcp.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 33a8bb63fbf5..9c6b18eebb5b 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -2384,6 +2384,13 @@ static int tcp_recvmsg_devmem(const struct sock *sk, const struct sk_buff *skb, > } > > ppiov = skb_frag_page_pool_iov(frag); > + > + /* Disallow non devmem owned buffers */ > + if (ppiov->pp->p.memory_provider != PP_MP_DMABUF_DEVMEM) { > + err = -ENODEV; > + goto out; > + } > + Instead of this, I maybe recommend modifying the skb->dmabuf flag? My mental model is that flag means all the frags in the skb are specifically dmabuf, not general ppiovs or net_iovs. Is it possible to add skb->io_uring or something? If that bloats the skb headers, then maybe we need another place to put this flag. Maybe the [page_pool|net]_iov should declare whether it's dmabuf or otherwise, and we can check frag[0] and assume all frags are the same as frag0. But IMO the page pool internals should not leak into the implementation of generic tcp stack functions. > end = start + skb_frag_size(frag); > copy = end - offset; > > -- > 2.39.3 >
On 12/19/23 23:24, Mina Almasry wrote: > On Tue, Dec 19, 2023 at 1:04 PM David Wei <dw@davidwei.uk> wrote: >> >> From: Pavel Begunkov <asml.silence@gmail.com> >> >> NOT FOR UPSTREAM >> >> There will be more users of struct page_pool_iov, and ppiovs from one >> subsystem must not be used by another. That should never happen for any >> sane application, but we need to enforce it in case of bufs and/or >> malicious users. >> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >> Signed-off-by: David Wei <dw@davidwei.uk> >> --- >> net/ipv4/tcp.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c >> index 33a8bb63fbf5..9c6b18eebb5b 100644 >> --- a/net/ipv4/tcp.c >> +++ b/net/ipv4/tcp.c >> @@ -2384,6 +2384,13 @@ static int tcp_recvmsg_devmem(const struct sock *sk, const struct sk_buff *skb, >> } >> >> ppiov = skb_frag_page_pool_iov(frag); >> + >> + /* Disallow non devmem owned buffers */ >> + if (ppiov->pp->p.memory_provider != PP_MP_DMABUF_DEVMEM) { >> + err = -ENODEV; >> + goto out; >> + } >> + > > Instead of this, I maybe recommend modifying the skb->dmabuf flag? My > mental model is that flag means all the frags in the skb are That's a good point, we need to separate them, and I have it in my todo list. > specifically dmabuf, not general ppiovs or net_iovs. Is it possible to > add skb->io_uring or something? ->io_uring flag is not feasible, converting ->devmem into a type {page,devmem,iouring} is better but not great either. > If that bloats the skb headers, then maybe we need another place to > put this flag. Maybe the [page_pool|net]_iov should declare whether > it's dmabuf or otherwise, and we can check frag[0] and assume all ppiov->pp should be enough, either not mixing buffers from different pools or comparing pp->ops or some pp->type. > frags are the same as frag0. I think I like this one the most. I think David Ahern mentioned before, but would be nice having it on per frag basis and kill ->devmem flag. That would still stop collapsing if frags are from different pools or so. > But IMO the page pool internals should not leak into the > implementation of generic tcp stack functions.
On Tue, Dec 19, 2023 at 5:34 PM Pavel Begunkov <asml.silence@gmail.com> wrote: > > On 12/19/23 23:24, Mina Almasry wrote: > > On Tue, Dec 19, 2023 at 1:04 PM David Wei <dw@davidwei.uk> wrote: > >> > >> From: Pavel Begunkov <asml.silence@gmail.com> > >> > >> NOT FOR UPSTREAM > >> > >> There will be more users of struct page_pool_iov, and ppiovs from one > >> subsystem must not be used by another. That should never happen for any > >> sane application, but we need to enforce it in case of bufs and/or > >> malicious users. > >> > >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> > >> Signed-off-by: David Wei <dw@davidwei.uk> > >> --- > >> net/ipv4/tcp.c | 7 +++++++ > >> 1 file changed, 7 insertions(+) > >> > >> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > >> index 33a8bb63fbf5..9c6b18eebb5b 100644 > >> --- a/net/ipv4/tcp.c > >> +++ b/net/ipv4/tcp.c > >> @@ -2384,6 +2384,13 @@ static int tcp_recvmsg_devmem(const struct sock *sk, const struct sk_buff *skb, > >> } > >> > >> ppiov = skb_frag_page_pool_iov(frag); > >> + > >> + /* Disallow non devmem owned buffers */ > >> + if (ppiov->pp->p.memory_provider != PP_MP_DMABUF_DEVMEM) { > >> + err = -ENODEV; > >> + goto out; > >> + } > >> + > > > > Instead of this, I maybe recommend modifying the skb->dmabuf flag? My > > mental model is that flag means all the frags in the skb are > > That's a good point, we need to separate them, and I have it in my > todo list. > > > specifically dmabuf, not general ppiovs or net_iovs. Is it possible to > > add skb->io_uring or something? > > ->io_uring flag is not feasible, converting ->devmem into a type > {page,devmem,iouring} is better but not great either. > > > If that bloats the skb headers, then maybe we need another place to > > put this flag. Maybe the [page_pool|net]_iov should declare whether > > it's dmabuf or otherwise, and we can check frag[0] and assume all > > ppiov->pp should be enough, either not mixing buffers from different > pools or comparing pp->ops or some pp->type. > > > frags are the same as frag0. > > I think I like this one the most. I think David Ahern mentioned > before, but would be nice having it on per frag basis and kill > ->devmem flag. That would still stop collapsing if frags are > from different pools or so. > This sounds reasonable to me. I'll look into applying this change to my next devmem TCP RFC, thanks. > > But IMO the page pool internals should not leak into the > > implementation of generic tcp stack functions. > > -- > Pavel Begunkov
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 33a8bb63fbf5..9c6b18eebb5b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2384,6 +2384,13 @@ static int tcp_recvmsg_devmem(const struct sock *sk, const struct sk_buff *skb, } ppiov = skb_frag_page_pool_iov(frag); + + /* Disallow non devmem owned buffers */ + if (ppiov->pp->p.memory_provider != PP_MP_DMABUF_DEVMEM) { + err = -ENODEV; + goto out; + } + end = start + skb_frag_size(frag); copy = end - offset;