diff mbox series

[v3,bpf,1/2] bpf: fix skb_do_redirect return values

Message ID cdbbc9df16044b568448ed9cd828d406f0851bfb.1690255889.git.yan@cloudflare.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf: return proper error codes for lwt redirect | expand

Checks

Context Check Description
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-4 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-5 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-3 success Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-10 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-11 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-13 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-14 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-15 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-18 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-19 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-21 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-22 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-23 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-25 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-27 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-28 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-29 success Logs for veristat
bpf/vmtest-bpf-VM_Test-26 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-12 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-16 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-8 success Logs for test_maps on s390x with gcc
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf, async
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1368 this patch: 1368
netdev/cc_maintainers success CCed 17 of 17 maintainers
netdev/build_clang success Errors and warnings before: 1365 this patch: 1365
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1391 this patch: 1391
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 30 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Yan Zhai July 25, 2023, 4:13 a.m. UTC
skb_do_redirect returns various of values: error code (negative), 0
(success), and some positive status code, e.g. NET_XMIT_CN, NET_RX_DROP.
Such code are not handled at lwt xmit hook in function ip_finish_output2
and ip6_finish_output, which can cause unexpected problems. This change
converts the positive status code to proper error code.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Reported-by: Jordan Griege <jgriege@cloudflare.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>

---
v3: converts also RX side return value in addition to TX values
v2: code style change suggested by Stanislav Fomichev
---
 net/core/filter.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Markus Elfring July 25, 2023, 5:10 a.m. UTC | #1
>                                      … unexpected problems. This change
> converts the positive status code to proper error code.

Please choose a corresponding imperative change suggestion.

See also:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.5-rc3#n94


Did you provide sufficient justification for a possible addition of the tag “Fixes”?


…
> v2: code style change suggested by Stanislav Fomichev
> ---
>  net/core/filter.c | 12 +++++++++++-
…

How do you think about to replace this marker by a line break?

See also:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.5-rc3#n711

Regards,
Markus
Jakub Sitnicki July 25, 2023, 9:08 a.m. UTC | #2
On Mon, Jul 24, 2023 at 09:13 PM -07, Yan Zhai wrote:
> skb_do_redirect returns various of values: error code (negative), 0
> (success), and some positive status code, e.g. NET_XMIT_CN, NET_RX_DROP.
> Such code are not handled at lwt xmit hook in function ip_finish_output2
> and ip6_finish_output, which can cause unexpected problems. This change
> converts the positive status code to proper error code.
>
> Suggested-by: Stanislav Fomichev <sdf@google.com>
> Reported-by: Jordan Griege <jgriege@cloudflare.com>
> Signed-off-by: Yan Zhai <yan@cloudflare.com>
>
> ---
> v3: converts also RX side return value in addition to TX values
> v2: code style change suggested by Stanislav Fomichev
> ---
>  net/core/filter.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 06ba0e56e369..3e232ce11ca0 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2095,7 +2095,12 @@ static const struct bpf_func_proto bpf_csum_level_proto = {
>  
>  static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
>  {
> -	return dev_forward_skb_nomtu(dev, skb);
> +	int ret = dev_forward_skb_nomtu(dev, skb);
> +
> +	if (unlikely(ret > 0))
> +		return -ENETDOWN;
> +
> +	return 0;
>  }
>  
>  static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
> @@ -2106,6 +2111,8 @@ static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
>  	if (likely(!ret)) {
>  		skb->dev = dev;
>  		ret = netif_rx(skb);
> +	} else if (ret > 0) {
> +		return -ENETDOWN;
>  	}
>  
>  	return ret;
> @@ -2129,6 +2136,9 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
>  	ret = dev_queue_xmit(skb);
>  	dev_xmit_recursion_dec();
>  
> +	if (unlikely(ret > 0))
> +		ret = net_xmit_errno(ret);
> +
>  	return ret;
>  }

net_xmit_errno maps NET_XMIT_DROP to -ENOBUFS. It would make sense to me
to map NET_RX_DROP to -ENOBUFS as well, instead of -ENETDOWN, to be
consistent.

It looks like the Fixes tag for this should point to the change that
introduced BPF for LWT:

Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure")
Yan Zhai July 25, 2023, 2:40 p.m. UTC | #3
On Tue, Jul 25, 2023 at 12:11 AM Markus Elfring <Markus.Elfring@web.de> wrote:
>
> >                                      … unexpected problems. This change
> > converts the positive status code to proper error code.
>
> Please choose a corresponding imperative change suggestion.
>
> See also:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.5-rc3#n94
>
>
> Did you provide sufficient justification for a possible addition of the tag “Fixes”?
>
>
> …
> > v2: code style change suggested by Stanislav Fomichev
> > ---
> >  net/core/filter.c | 12 +++++++++++-
> …
>
> How do you think about to replace this marker by a line break?
>
> See also:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.5-rc3#n711
>
> Regards,
> Markus

Hi Markus,

   Thanks for the suggestions, those are what I could use more help with.
   Will address these in the next version.

Yan
Yan Zhai July 25, 2023, 4:05 p.m. UTC | #4
On Tue, Jul 25, 2023 at 4:14 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Mon, Jul 24, 2023 at 09:13 PM -07, Yan Zhai wrote:
> > skb_do_redirect returns various of values: error code (negative), 0
> > (success), and some positive status code, e.g. NET_XMIT_CN, NET_RX_DROP.
> > Such code are not handled at lwt xmit hook in function ip_finish_output2
> > and ip6_finish_output, which can cause unexpected problems. This change
> > converts the positive status code to proper error code.
> >
> > Suggested-by: Stanislav Fomichev <sdf@google.com>
> > Reported-by: Jordan Griege <jgriege@cloudflare.com>
> > Signed-off-by: Yan Zhai <yan@cloudflare.com>
> >
> > ---
> > v3: converts also RX side return value in addition to TX values
> > v2: code style change suggested by Stanislav Fomichev
> > ---
> >  net/core/filter.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 06ba0e56e369..3e232ce11ca0 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -2095,7 +2095,12 @@ static const struct bpf_func_proto bpf_csum_level_proto = {
> >
> >  static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
> >  {
> > -     return dev_forward_skb_nomtu(dev, skb);
> > +     int ret = dev_forward_skb_nomtu(dev, skb);
> > +
> > +     if (unlikely(ret > 0))
> > +             return -ENETDOWN;
> > +
> > +     return 0;
> >  }
> >
> >  static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
> > @@ -2106,6 +2111,8 @@ static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
> >       if (likely(!ret)) {
> >               skb->dev = dev;
> >               ret = netif_rx(skb);
> > +     } else if (ret > 0) {
> > +             return -ENETDOWN;
> >       }
> >
> >       return ret;
> > @@ -2129,6 +2136,9 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
> >       ret = dev_queue_xmit(skb);
> >       dev_xmit_recursion_dec();
> >
> > +     if (unlikely(ret > 0))
> > +             ret = net_xmit_errno(ret);
> > +
> >       return ret;
> >  }
>
> net_xmit_errno maps NET_XMIT_DROP to -ENOBUFS. It would make sense to me
> to map NET_RX_DROP to -ENOBUFS as well, instead of -ENETDOWN, to be
> consistent.
>
In fact I looked at all those errno, but found none actually covers
this situation completely. For the redirect ingress case, there are
three reasons to fail: backlog full, dev down, and MTU issue. This
won't be a problem for typical RX paths, since the error code is
usually discarded by call sites of deliver_skb. But redirect ingress
opens a call chain that would propagate this error to local sendmsg,
which may be very confusing to troubleshoot in a complex environment
(especially when backlog fills).

That said I agree ENOBUF covers the most likely reason to fail
(backlog). Let me change to that one in the next version if there are
no new suggestions.

> It looks like the Fixes tag for this should point to the change that
> introduced BPF for LWT:
>
> Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure")
>
Thanks for finding the tag. I was debating if it should be LWT commit
or bpf_redirect commit: the error is not handled at LWT, but it seems
actually innocent. The actual fix is the return value from the bpf
redirect code. Let me incorporate both in the next one to justify
better.

--
Yan
Stanislav Fomichev July 25, 2023, 5:07 p.m. UTC | #5
On 07/25, Yan Zhai wrote:
> On Tue, Jul 25, 2023 at 4:14 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >
> > On Mon, Jul 24, 2023 at 09:13 PM -07, Yan Zhai wrote:
> > > skb_do_redirect returns various of values: error code (negative), 0
> > > (success), and some positive status code, e.g. NET_XMIT_CN, NET_RX_DROP.
> > > Such code are not handled at lwt xmit hook in function ip_finish_output2
> > > and ip6_finish_output, which can cause unexpected problems. This change
> > > converts the positive status code to proper error code.
> > >
> > > Suggested-by: Stanislav Fomichev <sdf@google.com>
> > > Reported-by: Jordan Griege <jgriege@cloudflare.com>
> > > Signed-off-by: Yan Zhai <yan@cloudflare.com>
> > >
> > > ---
> > > v3: converts also RX side return value in addition to TX values
> > > v2: code style change suggested by Stanislav Fomichev
> > > ---
> > >  net/core/filter.c | 12 +++++++++++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 06ba0e56e369..3e232ce11ca0 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -2095,7 +2095,12 @@ static const struct bpf_func_proto bpf_csum_level_proto = {
> > >
> > >  static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
> > >  {
> > > -     return dev_forward_skb_nomtu(dev, skb);
> > > +     int ret = dev_forward_skb_nomtu(dev, skb);
> > > +
> > > +     if (unlikely(ret > 0))
> > > +             return -ENETDOWN;
> > > +
> > > +     return 0;
> > >  }
> > >
> > >  static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
> > > @@ -2106,6 +2111,8 @@ static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
> > >       if (likely(!ret)) {
> > >               skb->dev = dev;
> > >               ret = netif_rx(skb);
> > > +     } else if (ret > 0) {
> > > +             return -ENETDOWN;
> > >       }
> > >
> > >       return ret;
> > > @@ -2129,6 +2136,9 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
> > >       ret = dev_queue_xmit(skb);
> > >       dev_xmit_recursion_dec();
> > >
> > > +     if (unlikely(ret > 0))
> > > +             ret = net_xmit_errno(ret);
> > > +
> > >       return ret;
> > >  }
> >
> > net_xmit_errno maps NET_XMIT_DROP to -ENOBUFS. It would make sense to me
> > to map NET_RX_DROP to -ENOBUFS as well, instead of -ENETDOWN, to be
> > consistent.
> >
> In fact I looked at all those errno, but found none actually covers
> this situation completely. For the redirect ingress case, there are
> three reasons to fail: backlog full, dev down, and MTU issue. This
> won't be a problem for typical RX paths, since the error code is
> usually discarded by call sites of deliver_skb. But redirect ingress
> opens a call chain that would propagate this error to local sendmsg,
> which may be very confusing to troubleshoot in a complex environment
> (especially when backlog fills).
> 
> That said I agree ENOBUF covers the most likely reason to fail
> (backlog). Let me change to that one in the next version if there are
> no new suggestions.

nit: also maybe wrap these rx paths into some new net_rx_errno ?
To mirror the tx side.
 
> > It looks like the Fixes tag for this should point to the change that
> > introduced BPF for LWT:
> >
> > Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure")
> >
> Thanks for finding the tag. I was debating if it should be LWT commit
> or bpf_redirect commit: the error is not handled at LWT, but it seems
> actually innocent. The actual fix is the return value from the bpf
> redirect code. Let me incorporate both in the next one to justify
> better.
> 
> --
> Yan
diff mbox series

Patch

diff --git a/net/core/filter.c b/net/core/filter.c
index 06ba0e56e369..3e232ce11ca0 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2095,7 +2095,12 @@  static const struct bpf_func_proto bpf_csum_level_proto = {
 
 static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
 {
-	return dev_forward_skb_nomtu(dev, skb);
+	int ret = dev_forward_skb_nomtu(dev, skb);
+
+	if (unlikely(ret > 0))
+		return -ENETDOWN;
+
+	return 0;
 }
 
 static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
@@ -2106,6 +2111,8 @@  static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
 	if (likely(!ret)) {
 		skb->dev = dev;
 		ret = netif_rx(skb);
+	} else if (ret > 0) {
+		return -ENETDOWN;
 	}
 
 	return ret;
@@ -2129,6 +2136,9 @@  static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
 	ret = dev_queue_xmit(skb);
 	dev_xmit_recursion_dec();
 
+	if (unlikely(ret > 0))
+		ret = net_xmit_errno(ret);
+
 	return ret;
 }