diff mbox series

[net,1/1] igc: avoid returning frame twice in XDP_REDIRECT

Message ID 20240219090843.9307-1-florian.kauer@linutronix.de (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [net,1/1] igc: avoid returning frame twice in XDP_REDIRECT | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 956 this patch: 956
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 5 maintainers not CCed: morbo@google.com justinstitt@google.com nathan@kernel.org ndesaulniers@google.com llvm@lists.linux.dev
netdev/build_clang success Errors and warnings before: 973 this patch: 973
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 973 this patch: 973
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 36 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Florian Kauer Feb. 19, 2024, 9:08 a.m. UTC
When a frame can not be transmitted in XDP_REDIRECT
(e.g. due to a full queue), it is necessary to free
it by calling xdp_return_frame_rx_napi.

However, this is the reponsibility of the caller of
the ndo_xdp_xmit (see for example bq_xmit_all in
kernel/bpf/devmap.c) and thus calling it inside
igc_xdp_xmit (which is the ndo_xdp_xmit of the igc
driver) as well will lead to memory corruption.

In fact, bq_xmit_all expects that it can return all
frames after the last successfully transmitted one.
Therefore, break for the first not transmitted frame,
but do not call xdp_return_frame_rx_napi in igc_xdp_xmit.
This is equally implemented in other Intel drivers
such as the igb.

There are two alternatives to this that were rejected:
1. Return num_frames as all the frames would have been
   transmitted and release them inside igc_xdp_xmit.
   While it might work technically, it is not what
   the return value is meant to repesent (i.e. the
   number of SUCCESSFULLY transmitted packets).
2. Rework kernel/bpf/devmap.c and all drivers to
   support non-consecutively dropped packets.
   Besides being complex, it likely has a negative
   performance impact without a significant gain
   since it is anyway unlikely that the next frame
   can be transmitted if the previous one was dropped.

The memory corruption can be reproduced with
the following script which leads to a kernel panic
after a few seconds.  It basically generates more
traffic than a i225 NIC can transmit and pushes it
via XDP_REDIRECT from a virtual interface to the
physical interface where frames get dropped.

   #!/bin/bash
   INTERFACE=enp4s0
   INTERFACE_IDX=`cat /sys/class/net/$INTERFACE/ifindex`

   sudo ip link add dev veth1 type veth peer name veth2
   sudo ip link set up $INTERFACE
   sudo ip link set up veth1
   sudo ip link set up veth2

   cat << EOF > redirect.bpf.c

   SEC("prog")
   int redirect(struct xdp_md *ctx)
   {
       return bpf_redirect($INTERFACE_IDX, 0);
   }

   char _license[] SEC("license") = "GPL";
   EOF
   clang -O2 -g -Wall -target bpf -c redirect.bpf.c -o redirect.bpf.o
   sudo ip link set veth2 xdp obj redirect.bpf.o

   cat << EOF > pass.bpf.c

   SEC("prog")
   int pass(struct xdp_md *ctx)
   {
       return XDP_PASS;
   }

   char _license[] SEC("license") = "GPL";
   EOF
   clang -O2 -g -Wall -target bpf -c pass.bpf.c -o pass.bpf.o
   sudo ip link set $INTERFACE xdp obj pass.bpf.o

   cat << EOF > trafgen.cfg

   {
     /* Ethernet Header */
     0xe8, 0x6a, 0x64, 0x41, 0xbf, 0x46,
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
     const16(ETH_P_IP),

     /* IPv4 Header */
     0b01000101, 0,   # IPv4 version, IHL, TOS
     const16(1028),   # IPv4 total length (UDP length + 20 bytes (IP header))
     const16(2),      # IPv4 ident
     0b01000000, 0,   # IPv4 flags, fragmentation off
     64,              # IPv4 TTL
     17,              # Protocol UDP
     csumip(14, 33),  # IPv4 checksum

     /* UDP Header */
     10,  0, 1, 1,    # IP Src - adapt as needed
     10,  0, 1, 2,    # IP Dest - adapt as needed
     const16(6666),   # UDP Src Port
     const16(6666),   # UDP Dest Port
     const16(1008),   # UDP length (UDP header 8 bytes + payload length)
     csumudp(14, 34), # UDP checksum

     /* Payload */
     fill('W', 1000),
   }
   EOF

   sudo trafgen -i trafgen.cfg -b3000MB -o veth1 --cpp

Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

Comments

Maciej Fijalkowski Feb. 19, 2024, 12:13 p.m. UTC | #1
On Mon, Feb 19, 2024 at 10:08:43AM +0100, Florian Kauer wrote:
> When a frame can not be transmitted in XDP_REDIRECT
> (e.g. due to a full queue), it is necessary to free
> it by calling xdp_return_frame_rx_napi.
> 
> However, this is the reponsibility of the caller of
> the ndo_xdp_xmit (see for example bq_xmit_all in
> kernel/bpf/devmap.c) and thus calling it inside
> igc_xdp_xmit (which is the ndo_xdp_xmit of the igc
> driver) as well will lead to memory corruption.
> 
> In fact, bq_xmit_all expects that it can return all
> frames after the last successfully transmitted one.
> Therefore, break for the first not transmitted frame,
> but do not call xdp_return_frame_rx_napi in igc_xdp_xmit.
> This is equally implemented in other Intel drivers
> such as the igb.
> 
> There are two alternatives to this that were rejected:
> 1. Return num_frames as all the frames would have been
>    transmitted and release them inside igc_xdp_xmit.
>    While it might work technically, it is not what
>    the return value is meant to repesent (i.e. the
>    number of SUCCESSFULLY transmitted packets).
> 2. Rework kernel/bpf/devmap.c and all drivers to
>    support non-consecutively dropped packets.
>    Besides being complex, it likely has a negative
>    performance impact without a significant gain
>    since it is anyway unlikely that the next frame
>    can be transmitted if the previous one was dropped.
> 
> The memory corruption can be reproduced with
> the following script which leads to a kernel panic
> after a few seconds.  It basically generates more
> traffic than a i225 NIC can transmit and pushes it
> via XDP_REDIRECT from a virtual interface to the
> physical interface where frames get dropped.
> 
>    #!/bin/bash
>    INTERFACE=enp4s0
>    INTERFACE_IDX=`cat /sys/class/net/$INTERFACE/ifindex`
> 
>    sudo ip link add dev veth1 type veth peer name veth2
>    sudo ip link set up $INTERFACE
>    sudo ip link set up veth1
>    sudo ip link set up veth2
> 
>    cat << EOF > redirect.bpf.c
> 
>    SEC("prog")
>    int redirect(struct xdp_md *ctx)
>    {
>        return bpf_redirect($INTERFACE_IDX, 0);
>    }
> 
>    char _license[] SEC("license") = "GPL";
>    EOF
>    clang -O2 -g -Wall -target bpf -c redirect.bpf.c -o redirect.bpf.o
>    sudo ip link set veth2 xdp obj redirect.bpf.o
> 
>    cat << EOF > pass.bpf.c
> 
>    SEC("prog")
>    int pass(struct xdp_md *ctx)
>    {
>        return XDP_PASS;
>    }
> 
>    char _license[] SEC("license") = "GPL";
>    EOF
>    clang -O2 -g -Wall -target bpf -c pass.bpf.c -o pass.bpf.o
>    sudo ip link set $INTERFACE xdp obj pass.bpf.o
> 
>    cat << EOF > trafgen.cfg
> 
>    {
>      /* Ethernet Header */
>      0xe8, 0x6a, 0x64, 0x41, 0xbf, 0x46,
>      0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
>      const16(ETH_P_IP),
> 
>      /* IPv4 Header */
>      0b01000101, 0,   # IPv4 version, IHL, TOS
>      const16(1028),   # IPv4 total length (UDP length + 20 bytes (IP header))
>      const16(2),      # IPv4 ident
>      0b01000000, 0,   # IPv4 flags, fragmentation off
>      64,              # IPv4 TTL
>      17,              # Protocol UDP
>      csumip(14, 33),  # IPv4 checksum
> 
>      /* UDP Header */
>      10,  0, 1, 1,    # IP Src - adapt as needed
>      10,  0, 1, 2,    # IP Dest - adapt as needed
>      const16(6666),   # UDP Src Port
>      const16(6666),   # UDP Dest Port
>      const16(1008),   # UDP length (UDP header 8 bytes + payload length)
>      csumudp(14, 34), # UDP checksum
> 
>      /* Payload */
>      fill('W', 1000),
>    }
>    EOF
> 
>    sudo trafgen -i trafgen.cfg -b3000MB -o veth1 --cpp
> 
> Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action")
> Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

Apparently fdc13979f91e ("bpf, devmap: Move drop error path to devmap for
XDP_REDIRECT") addressed all ndo_xdp_xmit callbacks but igc support was
added shortly after that...?

> ---
>  drivers/net/ethernet/intel/igc/igc_main.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index ba8d3fe186ae..81c21a893ede 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -6487,7 +6487,7 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
>  	int cpu = smp_processor_id();
>  	struct netdev_queue *nq;
>  	struct igc_ring *ring;
> -	int i, drops;
> +	int i, nxmit;
>  
>  	if (unlikely(!netif_carrier_ok(dev)))
>  		return -ENETDOWN;
> @@ -6503,16 +6503,15 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
>  	/* Avoid transmit queue timeout since we share it with the slow path */
>  	txq_trans_cond_update(nq);
>  
> -	drops = 0;
> +	nxmit = 0;
>  	for (i = 0; i < num_frames; i++) {
>  		int err;
>  		struct xdp_frame *xdpf = frames[i];
>  
>  		err = igc_xdp_init_tx_descriptor(ring, xdpf);
> -		if (err) {
> -			xdp_return_frame_rx_napi(xdpf);
> -			drops++;
> -		}
> +		if (err)
> +			break;
> +		nxmit++;
>  	}
>  
>  	if (flags & XDP_XMIT_FLUSH)
> @@ -6520,7 +6519,7 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
>  
>  	__netif_tx_unlock(nq);
>  
> -	return num_frames - drops;
> +	return nxmit;
>  }
>  
>  static void igc_trigger_rxtxq_interrupt(struct igc_adapter *adapter,
> -- 
> 2.39.2
>
naamax.meir March 5, 2024, 7:28 a.m. UTC | #2
On 2/19/2024 11:08, Florian Kauer wrote:
> When a frame can not be transmitted in XDP_REDIRECT
> (e.g. due to a full queue), it is necessary to free
> it by calling xdp_return_frame_rx_napi.
> 
> However, this is the reponsibility of the caller of
> the ndo_xdp_xmit (see for example bq_xmit_all in
> kernel/bpf/devmap.c) and thus calling it inside
> igc_xdp_xmit (which is the ndo_xdp_xmit of the igc
> driver) as well will lead to memory corruption.
> 
> In fact, bq_xmit_all expects that it can return all
> frames after the last successfully transmitted one.
> Therefore, break for the first not transmitted frame,
> but do not call xdp_return_frame_rx_napi in igc_xdp_xmit.
> This is equally implemented in other Intel drivers
> such as the igb.
> 
> There are two alternatives to this that were rejected:
> 1. Return num_frames as all the frames would have been
>     transmitted and release them inside igc_xdp_xmit.
>     While it might work technically, it is not what
>     the return value is meant to repesent (i.e. the
>     number of SUCCESSFULLY transmitted packets).
> 2. Rework kernel/bpf/devmap.c and all drivers to
>     support non-consecutively dropped packets.
>     Besides being complex, it likely has a negative
>     performance impact without a significant gain
>     since it is anyway unlikely that the next frame
>     can be transmitted if the previous one was dropped.
> 
> The memory corruption can be reproduced with
> the following script which leads to a kernel panic
> after a few seconds.  It basically generates more
> traffic than a i225 NIC can transmit and pushes it
> via XDP_REDIRECT from a virtual interface to the
> physical interface where frames get dropped.
> 
>     #!/bin/bash
>     INTERFACE=enp4s0
>     INTERFACE_IDX=`cat /sys/class/net/$INTERFACE/ifindex`
> 
>     sudo ip link add dev veth1 type veth peer name veth2
>     sudo ip link set up $INTERFACE
>     sudo ip link set up veth1
>     sudo ip link set up veth2
> 
>     cat << EOF > redirect.bpf.c
> 
>     SEC("prog")
>     int redirect(struct xdp_md *ctx)
>     {
>         return bpf_redirect($INTERFACE_IDX, 0);
>     }
> 
>     char _license[] SEC("license") = "GPL";
>     EOF
>     clang -O2 -g -Wall -target bpf -c redirect.bpf.c -o redirect.bpf.o
>     sudo ip link set veth2 xdp obj redirect.bpf.o
> 
>     cat << EOF > pass.bpf.c
> 
>     SEC("prog")
>     int pass(struct xdp_md *ctx)
>     {
>         return XDP_PASS;
>     }
> 
>     char _license[] SEC("license") = "GPL";
>     EOF
>     clang -O2 -g -Wall -target bpf -c pass.bpf.c -o pass.bpf.o
>     sudo ip link set $INTERFACE xdp obj pass.bpf.o
> 
>     cat << EOF > trafgen.cfg
> 
>     {
>       /* Ethernet Header */
>       0xe8, 0x6a, 0x64, 0x41, 0xbf, 0x46,
>       0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
>       const16(ETH_P_IP),
> 
>       /* IPv4 Header */
>       0b01000101, 0,   # IPv4 version, IHL, TOS
>       const16(1028),   # IPv4 total length (UDP length + 20 bytes (IP header))
>       const16(2),      # IPv4 ident
>       0b01000000, 0,   # IPv4 flags, fragmentation off
>       64,              # IPv4 TTL
>       17,              # Protocol UDP
>       csumip(14, 33),  # IPv4 checksum
> 
>       /* UDP Header */
>       10,  0, 1, 1,    # IP Src - adapt as needed
>       10,  0, 1, 2,    # IP Dest - adapt as needed
>       const16(6666),   # UDP Src Port
>       const16(6666),   # UDP Dest Port
>       const16(1008),   # UDP length (UDP header 8 bytes + payload length)
>       csumudp(14, 34), # UDP checksum
> 
>       /* Payload */
>       fill('W', 1000),
>     }
>     EOF
> 
>     sudo trafgen -i trafgen.cfg -b3000MB -o veth1 --cpp
> 
> Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action")
> Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
> ---
>   drivers/net/ethernet/intel/igc/igc_main.c | 13 ++++++-------
>   1 file changed, 6 insertions(+), 7 deletions(-)

Tested-by: Naama Meir <naamax.meir@linux.intel.com>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ba8d3fe186ae..81c21a893ede 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6487,7 +6487,7 @@  static int igc_xdp_xmit(struct net_device *dev, int num_frames,
 	int cpu = smp_processor_id();
 	struct netdev_queue *nq;
 	struct igc_ring *ring;
-	int i, drops;
+	int i, nxmit;
 
 	if (unlikely(!netif_carrier_ok(dev)))
 		return -ENETDOWN;
@@ -6503,16 +6503,15 @@  static int igc_xdp_xmit(struct net_device *dev, int num_frames,
 	/* Avoid transmit queue timeout since we share it with the slow path */
 	txq_trans_cond_update(nq);
 
-	drops = 0;
+	nxmit = 0;
 	for (i = 0; i < num_frames; i++) {
 		int err;
 		struct xdp_frame *xdpf = frames[i];
 
 		err = igc_xdp_init_tx_descriptor(ring, xdpf);
-		if (err) {
-			xdp_return_frame_rx_napi(xdpf);
-			drops++;
-		}
+		if (err)
+			break;
+		nxmit++;
 	}
 
 	if (flags & XDP_XMIT_FLUSH)
@@ -6520,7 +6519,7 @@  static int igc_xdp_xmit(struct net_device *dev, int num_frames,
 
 	__netif_tx_unlock(nq);
 
-	return num_frames - drops;
+	return nxmit;
 }
 
 static void igc_trigger_rxtxq_interrupt(struct igc_adapter *adapter,