diff mbox series

[net] llc: reset transport_header offset as value is inaccurate when buffer is processed by DSA

Message ID ef68689e-7e0b-4702-a762-d214c7d76e3b@gmail.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [net] llc: reset transport_header offset as value is inaccurate when buffer is processed by DSA | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/apply fail Patch does not apply to net-0

Commit Message

Antonio Pastor Dec. 9, 2024, 6:36 p.m. UTC
Hi,

While testing 802.2+LLC+SNAP processing of inbound packets in OpenWrt, 
it was found that network_header offset is 2 bytes short (before 
sbk->data) when the packet was received through OpenWrt's DSA 
(Distributed Switch Architecture). This causes SNAP OUI:PID mismatch and 
packet is silently dropped by snap_rcv().

Here a trace:

           <idle>-0       [001] ..s..  8744.047176: find_snap_client 
<-snap_rcv
           <idle>-0       [001] ..s..  8744.047218: <stack trace>
  => snap_rcv
  => llc_rcv
  => __netif_receive_skb_one_core
  => netif_receive_skb
  => br_handle_frame_finish
  => br_handle_frame
  => __netif_receive_skb_core.constprop.0
  => __netif_receive_skb_list_core
  => netif_receive_skb_list_internal
  => napi_complete_done
  => gro_cell_poll
  => __napi_poll.constprop.0
  => net_rx_action
  => handle_softirqs
  => irq_exit
  => call_with_stack
  => __irq_svc
  => default_idle_call
  => do_idle
  => cpu_startup_entry
  => secondary_start_kernel
  => 0x42301294

The offsets were detected as incorrect as early as napi_complete_done() 
and I gave up on tracking where the problem comes from. Running with GRO 
disabled makes no difference.

Curiously enough, __netif_receive_skb_list_core() resets network_header 
offset, but leaves transport_header offset alone if it was set, assuming 
it is correct. On non-DSA OpenWrt images it is, but since images were 
migrated to use DSA this issue appears. For locally generated packets 
transport_header offset is not set (0xffff) so 
__netif_receive_skb_list_core() resets it, which solves the issue. That 
is why inbound packets received from an external system exhibit the 
problem but locally generated traffic is processed OK.

I can only assume this has been an issue for a while but since 
presumably it only impacts 802.2+LLC+SNAP (which I'm aware is not much 
used today) it has not been flagged before. I wouldn't be surprised if 
any protocols using Ethernet II frames reset transport_header offset 
before they have anything to do with it.

The kernel code does not touch transport_header offset until llc_rcv() 
where it is moved forward based on the length of the LLC header as it is 
assumed correct, which is the issue.

Patch below proposes modifying llc_rcv() to reset transport_header 
offset and then push forward by the LLC header length. While a better 
solution might lurk elsewhere by tackling the root cause of why 
transport_header offset is off after DSA set it to begin with, that is 
taking too much effort to identify and risks widespread impact. A patch 
could be made to __netif_receive_skb_list_core() to always reset 
transport_header offset, but that would also impact all frames. This is 
a lower risk patch that will not impact any non 802.2+LLC frames, and 
presumably only SNAP ones. It follows the approach of 
__netif_receive_skb_list_core() of not trusting the offset as received 
and resetting it before snap_rcv() has a need for it.

Patch:

  net/llc/llc_input.c | 2 +-
  1 file changed, 1 insertions(+), 1 deletions(-)



Can you share your opinions on this patch and suggest next actions for 
its adoption (or modification) please?

Regards,

AP

Comments

Antonio Pastor Dec. 11, 2024, 1:03 a.m. UTC | #1
From 46c5a1ad90905e054b4a459e86b9ef98eca26df9 Mon Sep 17 00:00:00 2001
From: Antonio Pastor <antonio.pastor@gmail.com>
Date: Tue, 10 Dec 2024 19:45:20 -0500
Subject: [RFC PATCH] llc: llc_input: explicitly set skb->transport_header

Reset transport_header offset and apply the LLC header size increment, 
instead
of applying the increment on current value.
With DSA is enabled skb->transport_header is 2 bytes off, causing
net/802/psnap/snap_rcv to fail OUI:PID match and drop skb.

Signed-off-by: Antonio Pastor <antonio.pastor@gmail.com>
---
  net/llc/llc_input.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/llc/llc_input.c b/net/llc/llc_input.c
index 51bccfb00a9c..6f33ae9095f8 100644
--- a/net/llc/llc_input.c
+++ b/net/llc/llc_input.c
@@ -124,7 +124,7 @@ static inline int llc_fixup_skb(struct sk_buff *skb)
      if (unlikely(!pskb_may_pull(skb, llc_len)))
          return 0;

-    skb->transport_header += llc_len;
+    skb_set_transport_header(skb, llc_len);
      skb_pull(skb, llc_len);
      if (skb->protocol == htons(ETH_P_802_2)) {
          __be16 pdulen;
Paolo Abeni Dec. 12, 2024, 10:44 a.m. UTC | #2
On 12/9/24 19:36, Antonio Pastor wrote:
> While testing 802.2+LLC+SNAP processing of inbound packets in OpenWrt, 
> it was found that network_header offset is 2 bytes short (before 
> sbk->data) when the packet was received through OpenWrt's DSA 
> (Distributed Switch Architecture). This causes SNAP OUI:PID mismatch and 
> packet is silently dropped by snap_rcv().
> 
> Here a trace:
> 
>            <idle>-0       [001] ..s..  8744.047176: find_snap_client 
> <-snap_rcv
>            <idle>-0       [001] ..s..  8744.047218: <stack trace>
>   => snap_rcv
>   => llc_rcv
>   => __netif_receive_skb_one_core
>   => netif_receive_skb
>   => br_handle_frame_finish
>   => br_handle_frame
>   => __netif_receive_skb_core.constprop.0
>   => __netif_receive_skb_list_core
>   => netif_receive_skb_list_internal
>   => napi_complete_done
>   => gro_cell_poll
>   => __napi_poll.constprop.0
>   => net_rx_action
>   => handle_softirqs
>   => irq_exit
>   => call_with_stack
>   => __irq_svc
>   => default_idle_call
>   => do_idle
>   => cpu_startup_entry
>   => secondary_start_kernel
>   => 0x42301294
> 
> The offsets were detected as incorrect as early as napi_complete_done() 
> and I gave up on tracking where the problem comes from. Running with GRO 
> disabled makes no difference.
> 
> Curiously enough, __netif_receive_skb_list_core() resets network_header 
> offset, but leaves transport_header offset alone if it was set, assuming 
> it is correct. On non-DSA OpenWrt images it is, but since images were 
> migrated to use DSA this issue appears. For locally generated packets 
> transport_header offset is not set (0xffff) so 
> __netif_receive_skb_list_core() resets it, which solves the issue. That 
> is why inbound packets received from an external system exhibit the 
> problem but locally generated traffic is processed OK.
> 
> I can only assume this has been an issue for a while but since 
> presumably it only impacts 802.2+LLC+SNAP (which I'm aware is not much 
> used today) it has not been flagged before. I wouldn't be surprised if 
> any protocols using Ethernet II frames reset transport_header offset 
> before they have anything to do with it.
> 
> The kernel code does not touch transport_header offset until llc_rcv() 
> where it is moved forward based on the length of the LLC header as it is 
> assumed correct, which is the issue.
> 
> Patch below proposes modifying llc_rcv() to reset transport_header 
> offset and then push forward by the LLC header length. While a better 
> solution might lurk elsewhere by tackling the root cause of why 
> transport_header offset is off after DSA set it to begin with, that is 
> taking too much effort to identify and risks widespread impact. A patch 
> could be made to __netif_receive_skb_list_core() to always reset 
> transport_header offset, but that would also impact all frames. This is 
> a lower risk patch that will not impact any non 802.2+LLC frames, and 
> presumably only SNAP ones. It follows the approach of 
> __netif_receive_skb_list_core() of not trusting the offset as received 
> and resetting it before snap_rcv() has a need for it.
> 
> Patch:
> 
>   net/llc/llc_input.c | 2 +-
>   1 file changed, 1 insertions(+), 1 deletions(-)
> 
> --- a/net/llc/llc_input.c
> +++ b/net/llc/llc_input.c
> @@ -124,7 +124,7 @@ static inline int llc_fixup_skb(struct s
>       if (unlikely(!pskb_may_pull(skb, llc_len)))
>           return 0;
> 
> -    skb->transport_header += llc_len;
> +    skb_set_transport_header(skb, llc_len);
>       skb_pull(skb, llc_len);
>       if (skb->protocol == htons(ETH_P_802_2)) {
>           __be16 pdulen;
> 
> 
> Can you share your opinions on this patch and suggest next actions for 
> its adoption (or modification) please?

IMHO llc should avoid entirely using the transport_header field as
AFAICS the relevant information do not belong to such layer. Anyhow such
change looks like way too invasive as a fix.

Your approach is IMHO correct, but the patch itself is white-space
damage - be sure to generate it on top of current net tree and double
your client is not corrupting it. Additionally, as a fix the patch must
include a suitable Fixes tag. Have an accurate read to the process
documentation before formally submitting the fix.

Thanks,

Paolo
diff mbox series

Patch

--- a/net/llc/llc_input.c
+++ b/net/llc/llc_input.c
@@ -124,7 +124,7 @@  static inline int llc_fixup_skb(struct s
      if (unlikely(!pskb_may_pull(skb, llc_len)))
          return 0;

-    skb->transport_header += llc_len;
+    skb_set_transport_header(skb, llc_len);
      skb_pull(skb, llc_len);
      if (skb->protocol == htons(ETH_P_802_2)) {
          __be16 pdulen;