Message ID | 20230815150325.2010460-1-tirthendu.sarkar@intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next,v2] xsk: fix xsk_build_skb() error: 'skb' dereferencing possible ERR_PTR() | expand |
On 08/15, Tirthendu Sarkar wrote: > xsk_build_skb_zerocopy() may return an error other than -EAGAIN and this > is received as skb and used later in xsk_set_destructor_arg() and > xsk_drop_skb() which must operate on a valid skb. > > Set -EOVERFLOW as error when MAX_SKB_FRAGS are exceeded and packet needs > to be dropped and use this to distinguish against all other error cases > where allocation needs to be retried. > > Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> > Reported-by: kernel test robot <lkp@intel.com> > Reported-by: Dan Carpenter <dan.carpenter@linaro.org> > Closes: https://lore.kernel.org/r/202307210434.OjgqFcbB-lkp@intel.com/ > Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") > > Changelog: > v1 -> v2: > - Removed err as a parameter to xsk_build_skb_zerocopy() > [Stanislav Fomichev] > - use explicit error to distinguish packet drop vs retry > --- > net/xdp/xsk.c | 22 +++++++++++++--------- > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index fcfc8472f73d..55f8b9b0e06d 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -602,7 +602,7 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, > > for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) { > if (unlikely(i >= MAX_SKB_FRAGS)) > - return ERR_PTR(-EFAULT); > + return ERR_PTR(-EOVERFLOW); > > page = pool->umem->pgs[addr >> PAGE_SHIFT]; > get_page(page); > @@ -655,15 +655,17 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > skb_put(skb, len); > > err = skb_store_bits(skb, 0, buffer, len); > - if (unlikely(err)) > + if (unlikely(err)) { > + kfree_skb(skb); > goto free_err; > + } > } else { > int nr_frags = skb_shinfo(skb)->nr_frags; > struct page *page; > u8 *vaddr; > > if (unlikely(nr_frags == (MAX_SKB_FRAGS - 1) && xp_mb_desc(desc))) { > - err = -EFAULT; > + err = -EOVERFLOW; > goto free_err; > } > > @@ -690,12 +692,14 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > return skb; > > free_err: > - if (err == -EAGAIN) { > - xsk_cq_cancel_locked(xs, 1); > - } else { > - xsk_set_destructor_arg(skb); > - xsk_drop_skb(skb); > + if (err == -EOVERFLOW) { Don't think this will work? We have some other error paths in xsk_build_skb that are not -EOVERFLOW that still need kfree_skb, right? I feel like we are trying to share some state between xsk_build_skb and xsk_build_skb_zerocopy which we really shouldn't share. So how about we try to have a separate cleanup path in xsk_build_skb_zerocopy? Will something like the following (untested / uncompiled) work instead? IOW, ideally, xsk_build_skb should look like: if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { return xsk_build_skb_zerocopy(xs, desc); } else { return xsk_build_skb_copy(xs, desc); /* ^^ current path that should really be a separate func */ } diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 79a96019b7ef..747dd012afdb 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -578,6 +578,19 @@ static void xsk_drop_skb(struct sk_buff *skb) xsk_consume_skb(skb); } +static struct sk_buff *xsk_cleanup_skb(int err, struct sk_buff *skb, struct xdp_sock *xs) +{ + if (err == -EAGAIN) { + xsk_cq_cancel_locked(xs, 1); + } else { + xsk_set_destructor_arg(skb); + xsk_drop_skb(skb); + xskq_cons_release(xs->tx); + } + + return ERR_PTR(err); +} + static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, struct xdp_desc *desc) { @@ -593,8 +606,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err); - if (unlikely(!skb)) - return ERR_PTR(err); + if (unlikely(!skb)) { + err = ERR_PTR(err); + goto free_err; + } skb_reserve(skb, hr); } @@ -608,8 +623,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, addr = buffer - pool->addrs; for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) { - if (unlikely(i >= MAX_SKB_FRAGS)) - return ERR_PTR(-EFAULT); + if (unlikely(i >= MAX_SKB_FRAGS)) { + err = ERR_PTR(-EFAULT); + goto free_err; + } page = pool->umem->pgs[addr >> PAGE_SHIFT]; get_page(page); @@ -629,6 +646,9 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, refcount_add(ts, &xs->sk.sk_wmem_alloc); return skb; + +free_err: + return xsk_cleanup_skb(err, skb, xs); } static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, @@ -641,11 +661,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, int err; if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { - skb = xsk_build_skb_zerocopy(xs, desc); - if (IS_ERR(skb)) { - err = PTR_ERR(skb); - goto free_err; - } + return xsk_build_skb_zerocopy(xs, desc); } else { u32 hr, tr, len; void *buffer; @@ -729,15 +745,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, return skb; free_err: - if (err == -EAGAIN) { - xsk_cq_cancel_locked(xs, 1); - } else { - xsk_set_destructor_arg(skb); - xsk_drop_skb(skb); - xskq_cons_release(xs->tx); - } - - return ERR_PTR(err); + return xsk_cleanup_skb(err, skb, xs); } static int __xsk_generic_xmit(struct sock *sk)
> -----Original Message----- > From: Stanislav Fomichev <sdf@google.com> > Sent: Tuesday, August 15, 2023 11:51 PM > On 08/15, Tirthendu Sarkar wrote: > > xsk_build_skb_zerocopy() may return an error other than -EAGAIN and > this > > is received as skb and used later in xsk_set_destructor_arg() and > > xsk_drop_skb() which must operate on a valid skb. > > > > Set -EOVERFLOW as error when MAX_SKB_FRAGS are exceeded and > packet needs > > to be dropped and use this to distinguish against all other error cases > > where allocation needs to be retried. > > > > Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> > > Reported-by: kernel test robot <lkp@intel.com> > > Reported-by: Dan Carpenter <dan.carpenter@linaro.org> > > Closes: https://lore.kernel.org/r/202307210434.OjgqFcbB-lkp@intel.com/ > > Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx > path") > > > > Changelog: > > v1 -> v2: > > - Removed err as a parameter to xsk_build_skb_zerocopy() > > [Stanislav Fomichev] > > - use explicit error to distinguish packet drop vs retry > > --- > > net/xdp/xsk.c | 22 +++++++++++++--------- > > 1 file changed, 13 insertions(+), 9 deletions(-) > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > index fcfc8472f73d..55f8b9b0e06d 100644 > > --- a/net/xdp/xsk.c > > +++ b/net/xdp/xsk.c > > @@ -602,7 +602,7 @@ static struct sk_buff > *xsk_build_skb_zerocopy(struct xdp_sock *xs, > > > > for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) { > > if (unlikely(i >= MAX_SKB_FRAGS)) > > - return ERR_PTR(-EFAULT); > > + return ERR_PTR(-EOVERFLOW); > > > > page = pool->umem->pgs[addr >> PAGE_SHIFT]; > > get_page(page); > > @@ -655,15 +655,17 @@ static struct sk_buff *xsk_build_skb(struct > xdp_sock *xs, > > skb_put(skb, len); > > > > err = skb_store_bits(skb, 0, buffer, len); > > - if (unlikely(err)) > > + if (unlikely(err)) { > > + kfree_skb(skb); > > goto free_err; > > + } > > } else { > > int nr_frags = skb_shinfo(skb)->nr_frags; > > struct page *page; > > u8 *vaddr; > > > > if (unlikely(nr_frags == (MAX_SKB_FRAGS - 1) && > xp_mb_desc(desc))) { > > - err = -EFAULT; > > + err = -EOVERFLOW; > > goto free_err; > > } > > > > @@ -690,12 +692,14 @@ static struct sk_buff *xsk_build_skb(struct > xdp_sock *xs, > > return skb; > > > > free_err: > > - if (err == -EAGAIN) { > > - xsk_cq_cancel_locked(xs, 1); > > - } else { > > - xsk_set_destructor_arg(skb); > > - xsk_drop_skb(skb); > > + if (err == -EOVERFLOW) { > > Don't think this will work? We have some other error paths in xsk_build_skb > that are not -EOVERFLOW that still need kfree_skb, right? > There are 4 possible error paths in xsk_build_skb(): 1. sock_alloc_send_skb: skb is NULL; retry 2. skb_store_bits : free skb and retry 3. MAX_SKB_FRAGS exceeded: Free skb, cleanup and drop packet 4. alloc_page fails for frag: retry page allocation for frag w/o freeing skb Of these 1] and 3] can also happen in xsk_build_skb_zerocopy() and the error returned is either -EOVERFLOW or something else and the same error handling needs to be done. > I feel like we are trying to share some state between xsk_build_skb and > xsk_build_skb_zerocopy which we really shouldn't share. So how about > we try to have a separate cleanup path in xsk_build_skb_zerocopy? > The only things that are common between *copy and *zerocopy paths are setting the skb headers (destructor/args, mark, priority) and error handling. While we can potentially split out the paths into independent functions, the same skb header settings and error handling will still need to be duplicated in both functions. > Will something like the following (untested / uncompiled) work instead? > > IOW, ideally, xsk_build_skb should look like: > > if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { > return xsk_build_skb_zerocopy(xs, desc); > } else { > return xsk_build_skb_copy(xs, desc); > /* ^^ current path that should really be a separate func */ > } >
On Wed, Aug 16, 2023 at 08:04:52AM +0200, Sarkar, Tirthendu wrote: > > -----Original Message----- > > From: Stanislav Fomichev <sdf@google.com> > > Sent: Tuesday, August 15, 2023 11:51 PM > > On 08/15, Tirthendu Sarkar wrote: > > > xsk_build_skb_zerocopy() may return an error other than -EAGAIN and > > this > > > is received as skb and used later in xsk_set_destructor_arg() and > > > xsk_drop_skb() which must operate on a valid skb. > > > > > > Set -EOVERFLOW as error when MAX_SKB_FRAGS are exceeded and > > packet needs > > > to be dropped and use this to distinguish against all other error cases > > > where allocation needs to be retried. Please be explicit - say that you're changing this error code in xsk_build_skb_zerocopy() otherwise it's not clear. You're not saying anything about kfree_skb() that is added when skb_store_bits() failed. This code is non-trivial so all of the changes need to be described. Also did we test this patch? I believe it would require us to run xdpsock against virtio net device? > > > > > > Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> > > > Reported-by: kernel test robot <lkp@intel.com> > > > Reported-by: Dan Carpenter <dan.carpenter@linaro.org> > > > Closes: https://lore.kernel.org/r/202307210434.OjgqFcbB-lkp@intel.com/ > > > Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx > > path") > > > > > > Changelog: > > > v1 -> v2: > > > - Removed err as a parameter to xsk_build_skb_zerocopy() > > > [Stanislav Fomichev] > > > - use explicit error to distinguish packet drop vs retry > > > --- > > > net/xdp/xsk.c | 22 +++++++++++++--------- > > > 1 file changed, 13 insertions(+), 9 deletions(-) > > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > > index fcfc8472f73d..55f8b9b0e06d 100644 > > > --- a/net/xdp/xsk.c > > > +++ b/net/xdp/xsk.c > > > @@ -602,7 +602,7 @@ static struct sk_buff > > *xsk_build_skb_zerocopy(struct xdp_sock *xs, > > > > > > for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) { > > > if (unlikely(i >= MAX_SKB_FRAGS)) > > > - return ERR_PTR(-EFAULT); > > > + return ERR_PTR(-EOVERFLOW); > > > > > > page = pool->umem->pgs[addr >> PAGE_SHIFT]; > > > get_page(page); > > > @@ -655,15 +655,17 @@ static struct sk_buff *xsk_build_skb(struct > > xdp_sock *xs, > > > skb_put(skb, len); > > > > > > err = skb_store_bits(skb, 0, buffer, len); > > > - if (unlikely(err)) > > > + if (unlikely(err)) { > > > + kfree_skb(skb); > > > goto free_err; > > > + } > > > } else { > > > int nr_frags = skb_shinfo(skb)->nr_frags; > > > struct page *page; > > > u8 *vaddr; > > > > > > if (unlikely(nr_frags == (MAX_SKB_FRAGS - 1) && > > xp_mb_desc(desc))) { > > > - err = -EFAULT; > > > + err = -EOVERFLOW; > > > goto free_err; > > > } > > > > > > @@ -690,12 +692,14 @@ static struct sk_buff *xsk_build_skb(struct > > xdp_sock *xs, > > > return skb; > > > > > > free_err: > > > - if (err == -EAGAIN) { > > > - xsk_cq_cancel_locked(xs, 1); > > > - } else { > > > - xsk_set_destructor_arg(skb); > > > - xsk_drop_skb(skb); > > > + if (err == -EOVERFLOW) { > > > > Don't think this will work? We have some other error paths in xsk_build_skb > > that are not -EOVERFLOW that still need kfree_skb, right? > > > > There are 4 possible error paths in xsk_build_skb(): > 1. sock_alloc_send_skb: skb is NULL; retry > 2. skb_store_bits : free skb and retry > 3. MAX_SKB_FRAGS exceeded: Free skb, cleanup and drop packet > 4. alloc_page fails for frag: retry page allocation for frag w/o freeing skb > > Of these 1] and 3] can also happen in xsk_build_skb_zerocopy() and the > error returned is either -EOVERFLOW or something else and the same > error handling needs to be done. That would be helpful to have it included in commit message. > > > I feel like we are trying to share some state between xsk_build_skb and > > xsk_build_skb_zerocopy which we really shouldn't share. So how about > > we try to have a separate cleanup path in xsk_build_skb_zerocopy? > > > > The only things that are common between *copy and *zerocopy paths are > setting the skb headers (destructor/args, mark, priority) and error handling. > > While we can potentially split out the paths into independent functions, the > same skb header settings and error handling will still need to be duplicated in > both functions. > > > Will something like the following (untested / uncompiled) work instead? > > > > IOW, ideally, xsk_build_skb should look like: > > > > if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { > > return xsk_build_skb_zerocopy(xs, desc); > > } else { > > return xsk_build_skb_copy(xs, desc); > > /* ^^ current path that should really be a separate func */ > > } IMHO this is way out of the scope for a fix. We can think later on about cleaning up these paths. > >
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index fcfc8472f73d..55f8b9b0e06d 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -602,7 +602,7 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) { if (unlikely(i >= MAX_SKB_FRAGS)) - return ERR_PTR(-EFAULT); + return ERR_PTR(-EOVERFLOW); page = pool->umem->pgs[addr >> PAGE_SHIFT]; get_page(page); @@ -655,15 +655,17 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, skb_put(skb, len); err = skb_store_bits(skb, 0, buffer, len); - if (unlikely(err)) + if (unlikely(err)) { + kfree_skb(skb); goto free_err; + } } else { int nr_frags = skb_shinfo(skb)->nr_frags; struct page *page; u8 *vaddr; if (unlikely(nr_frags == (MAX_SKB_FRAGS - 1) && xp_mb_desc(desc))) { - err = -EFAULT; + err = -EOVERFLOW; goto free_err; } @@ -690,12 +692,14 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, return skb; free_err: - if (err == -EAGAIN) { - xsk_cq_cancel_locked(xs, 1); - } else { - xsk_set_destructor_arg(skb); - xsk_drop_skb(skb); + if (err == -EOVERFLOW) { + /* Drop the packet */ + xsk_set_destructor_arg(xs->skb); + xsk_drop_skb(xs->skb); xskq_cons_release(xs->tx); + } else { + /* Let application retry */ + xsk_cq_cancel_locked(xs, 1); } return ERR_PTR(err); @@ -738,7 +742,7 @@ static int __xsk_generic_xmit(struct sock *sk) skb = xsk_build_skb(xs, &desc); if (IS_ERR(skb)) { err = PTR_ERR(skb); - if (err == -EAGAIN) + if (err != -EOVERFLOW) goto out; err = 0; continue;
xsk_build_skb_zerocopy() may return an error other than -EAGAIN and this is received as skb and used later in xsk_set_destructor_arg() and xsk_drop_skb() which must operate on a valid skb. Set -EOVERFLOW as error when MAX_SKB_FRAGS are exceeded and packet needs to be dropped and use this to distinguish against all other error cases where allocation needs to be retried. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202307210434.OjgqFcbB-lkp@intel.com/ Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") Changelog: v1 -> v2: - Removed err as a parameter to xsk_build_skb_zerocopy() [Stanislav Fomichev] - use explicit error to distinguish packet drop vs retry --- net/xdp/xsk.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)