diff mbox series

[net,v2] net: sfc: add missing xdp queue reinitialization

Message ID 20220330163703.25086-1-ap420073@gmail.com (mailing list archive)
State Accepted
Commit 059a47f1da93811d37533556d67e72f2261b1127
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] net: sfc: add missing xdp queue reinitialization | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 5 maintainers not CCed: andrii@kernel.org yhs@fb.com kpsingh@kernel.org kafai@fb.com songliubraving@fb.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 180 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 1 now: 0

Commit Message

Taehee Yoo March 30, 2022, 4:37 p.m. UTC
After rx/tx ring buffer size is changed, kernel panic occurs when
it acts XDP_TX or XDP_REDIRECT.

When tx/rx ring buffer size is changed(ethtool -G), sfc driver
reallocates and reinitializes rx and tx queues and their buffer
(tx_queue->buffer).
But it misses reinitializing xdp queues(efx->xdp_tx_queues).
So, while it is acting XDP_TX or XDP_REDIRECT, it uses the uninitialized
tx_queue->buffer.

A new function efx_set_xdp_channels() is separated from efx_set_channels()
to handle only xdp queues.

Splat looks like:
   BUG: kernel NULL pointer dereference, address: 000000000000002a
   #PF: supervisor write access in kernel mode
   #PF: error_code(0x0002) - not-present page
   PGD 0 P4D 0
   Oops: 0002 [#4] PREEMPT SMP NOPTI
   RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc]
   CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D           5.17.0+ #55 e8beeee8289528f11357029357cf
   Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80
   RSP: 0018:ffff92f121e45c60 EFLAGS: 00010297
   RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc]
   RAX: 0000000000000040 RBX: ffff92ea506895c0 RCX: ffffffffc0330870
   RDX: 0000000000000001 RSI: 00000001139b10ce RDI: ffff92ea506895c0
   RBP: ffffffffc0358a80 R08: 00000001139b110d R09: 0000000000000000
   R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040
   R13: 0000000000000018 R14: 00000001139b10ce R15: ffff92ea506895c0
   FS:  0000000000000000(0000) GS:ffff92f121ec0000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80
   CR2: 000000000000002a CR3: 00000003e6810004 CR4: 00000000007706e0
   RSP: 0018:ffff92f121e85c60 EFLAGS: 00010297
   PKRU: 55555554
   RAX: 0000000000000040 RBX: ffff92ea50689700 RCX: ffffffffc0330870
   RDX: 0000000000000001 RSI: 00000001145a90ce RDI: ffff92ea50689700
   RBP: ffffffffc0358a80 R08: 00000001145a910d R09: 0000000000000000
   R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040
   R13: 0000000000000018 R14: 00000001145a90ce R15: ffff92ea50689700
   FS:  0000000000000000(0000) GS:ffff92f121e80000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 000000000000002a CR3: 00000003e6810005 CR4: 00000000007706e0
   PKRU: 55555554
   Call Trace:
    <IRQ>
    efx_xdp_tx_buffers+0x12b/0x3d0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
    __efx_rx_packet+0x5c3/0x930 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
    efx_rx_packet+0x28c/0x2e0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
    efx_ef10_ev_process+0x5f8/0xf40 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
    ? enqueue_task_fair+0x95/0x550
    efx_poll+0xc4/0x360 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]

Fixes: 3990a8fffbda ("sfc: allocate channels for XDP tx queues")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---

v2: Do not use inline in .c file

 drivers/net/ethernet/sfc/efx_channels.c | 146 +++++++++++++-----------
 1 file changed, 81 insertions(+), 65 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org April 1, 2022, 11 a.m. UTC | #1
Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed, 30 Mar 2022 16:37:03 +0000 you wrote:
> After rx/tx ring buffer size is changed, kernel panic occurs when
> it acts XDP_TX or XDP_REDIRECT.
> 
> When tx/rx ring buffer size is changed(ethtool -G), sfc driver
> reallocates and reinitializes rx and tx queues and their buffer
> (tx_queue->buffer).
> But it misses reinitializing xdp queues(efx->xdp_tx_queues).
> So, while it is acting XDP_TX or XDP_REDIRECT, it uses the uninitialized
> tx_queue->buffer.
> 
> [...]

Here is the summary with links:
  - [net,v2] net: sfc: add missing xdp queue reinitialization
    https://git.kernel.org/netdev/net/c/059a47f1da93

You are awesome, thank you!
Martin Habets April 1, 2022, 11:06 a.m. UTC | #2
Hi Taehee,

Thanks for looking into this. Unfortunately efx_realloc_channels()
has turned out to be quite fragile over the years, so I'm
keen to remove it in stead of patching it up all the time.

Could you try the patch below please?
If it works ok for you as well we'll be able to remove
efx_realloc_channels(). The added advantage of this approach
is that the netdev notifiers get informed of the change.

Regards,
Martin Habets <habetsm.xilinx@gmail.com>

---
 drivers/net/ethernet/sfc/ethtool.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
index 48506373721a..8cfbe61737bb 100644
--- a/drivers/net/ethernet/sfc/ethtool.c
+++ b/drivers/net/ethernet/sfc/ethtool.c
@@ -179,6 +179,7 @@ efx_ethtool_set_ringparam(struct net_device *net_dev,
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 	u32 txq_entries;
+	int rc = 0;
 
 	if (ring->rx_mini_pending || ring->rx_jumbo_pending ||
 	    ring->rx_pending > EFX_MAX_DMAQ_SIZE ||
@@ -198,7 +199,17 @@ efx_ethtool_set_ringparam(struct net_device *net_dev,
 			   "increasing TX queue size to minimum of %u\n",
 			   txq_entries);
 
-	return efx_realloc_channels(efx, ring->rx_pending, txq_entries);
+	/* Apply the new settings */
+	efx->rxq_entries = ring->rx_pending;
+	efx->txq_entries = ring->tx_pending;
+
+	/* Update the datapath with the new settings if the interface is up */
+	if (!efx_check_disabled(efx) && netif_running(efx->net_dev)) {
+		dev_close(net_dev);
+		rc = dev_open(net_dev, NULL);
+	}
+
+	return rc;
 }
 
 static void efx_ethtool_get_wol(struct net_device *net_dev,
Taehee Yoo April 1, 2022, 6:48 p.m. UTC | #3
On 4/1/22 20:06, Martin Habets wrote:

Hi Martin,
Thank you so much for your review!

 > Hi Taehee,
 >
 > Thanks for looking into this. Unfortunately efx_realloc_channels()
 > has turned out to be quite fragile over the years, so I'm
 > keen to remove it in stead of patching it up all the time.

I agree with you.
efx_realloc_channels() is too complex.

 >
 > Could you try the patch below please?
 > If it works ok for you as well we'll be able to remove
 > efx_realloc_channels(). The added advantage of this approach
 > is that the netdev notifiers get informed of the change.

I tested your patch and I found a page reference count problem.
How to test:
1. set up XDP_TX
2. traffic on
3. traffic off
4. ring buffer size change
5. loop from 2 to 4.

[   87.836195][   T72] BUG: Bad page state in process kworker/u16:1 
pfn:125445
[   87.843356][   T72] page:000000003725f642 refcount:-2 mapcount:0 
mapping:0000000000000000 index:0x0 pfn:0x125445
[   87.853783][   T72] flags: 0x200000000000000(node=0|zone=2) 

[   87.859391][   T72] raw: 0200000000000000 dead000000000100 
dead000000000122 0000000000000000
[   87.867928][   T72] raw: 0000000000000000 0000000000000000 
fffffffeffffffff 0000000000000000
[   87.876569][   T72] page dumped because: nonzero _refcount 

[   87.882125][   T72] Modules linked in: af_packet sfc ixgbe mtd 
atlantic coretemp mdio hwmon sch_fq_codel msr bpf_prelx
[   87.895331][   T72] CPU: 0 PID: 72 Comm: kworker/u16:1 Not tainted 
5.17.0+ #62 dbf33652f22e5147659e7e2472bb962779c4833
[   87.906350][   T72] Hardware name: ASUS System Product Name/PRIME 
Z690-P D4, BIOS 0603 11/01/2021
[   87.915360][   T72] Workqueue: netns cleanup_net 

[   87.920087][   T72] Call Trace: 

[   87.923311][   T72]  <TASK> 

[   87.926188][   T72]  dump_stack_lvl+0x56/0x7b 

[   87.930597][   T72]  bad_page.cold.125+0x63/0x93 

[   87.935288][   T72]  free_pcppages_bulk+0x63c/0x6f0 

[   87.940232][   T72]  free_unref_page+0x8b/0xf0 

[   87.944749][   T72]  efx_fini_rx_queue+0x15f/0x210 [sfc 
49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e]
[   87.953756][   T72]  efx_stop_channels+0xef/0x1b0 [sfc 
49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e]
[   87.962699][   T72]  efx_net_stop+0x4d/0x60 [sfc 
49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e]
[   87.971029][   T72]  __dev_close_many+0x8b/0xf0 

[   87.975618][   T72]  dev_close_many+0x7d/0x120 

[ ... ]


In addition, I would like to share issues that I'm currently looking into:
1. TX DMA error
when interface down/up or ring buffer size changes, TX DMA error would occur
because tx_queue can be used before initialization.
But It will be fixed by the below patch.

  static void efx_ethtool_get_wol(struct net_device *net_dev,
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index d16e031e95f4..6983799e1c05 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -443,6 +443,9 @@ int efx_xdp_tx_buffers(struct efx_nic *efx, int n, 
struct xdp_frame **xdpfs,
         if (unlikely(!tx_queue))
                 return -EINVAL;

+       if (!tx_queue->initialised)
+               return -EINVAL;
+
         if (efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED)
                 HARD_TX_LOCK(efx->net_dev, tx_queue->core_txq, cpu);

diff --git a/drivers/net/ethernet/sfc/tx_common.c 
b/drivers/net/ethernet/sfc/tx_common.c
index d530cde2b864..9bc8281b7f5b 100644
--- a/drivers/net/ethernet/sfc/tx_common.c
+++ b/drivers/net/ethernet/sfc/tx_common.c
@@ -101,6 +101,8 @@ void efx_fini_tx_queue(struct efx_tx_queue *tx_queue)
         netif_dbg(tx_queue->efx, drv, tx_queue->efx->net_dev,
                   "shutting down TX queue %d\n", tx_queue->queue);

+       tx_queue->initialised = false;
+
         if (!tx_queue->buffer)
                 return;

After your patch, unfortunately, it can't fix ring buffer size change case.
It can fix only interface down/up case.
I will look into this more.

2. Memory leak
There is a memory leak in ring buffer size change logic.
reproducer:
    while :
    do
        ethtool -G <interface name> rx 2048 tx 2048
        ethtool -G <interface name> rx 1024 tx 1024
    done

Thanks a lot,
Taehee Yoo

 >
 > Regards,
 > Martin Habets <habetsm.xilinx@gmail.com>
 >
 > ---
 >   drivers/net/ethernet/sfc/ethtool.c |   13 ++++++++++++-
 >   1 file changed, 12 insertions(+), 1 deletion(-)
 >
 > diff --git a/drivers/net/ethernet/sfc/ethtool.c 
b/drivers/net/ethernet/sfc/ethtool.c
 > index 48506373721a..8cfbe61737bb 100644
 > --- a/drivers/net/ethernet/sfc/ethtool.c
 > +++ b/drivers/net/ethernet/sfc/ethtool.c
 > @@ -179,6 +179,7 @@ efx_ethtool_set_ringparam(struct net_device *net_dev,
 >   {
 >   	struct efx_nic *efx = netdev_priv(net_dev);
 >   	u32 txq_entries;
 > +	int rc = 0;
 >
 >   	if (ring->rx_mini_pending || ring->rx_jumbo_pending ||
 >   	    ring->rx_pending > EFX_MAX_DMAQ_SIZE ||
 > @@ -198,7 +199,17 @@ efx_ethtool_set_ringparam(struct net_device 
*net_dev,
 >   			   "increasing TX queue size to minimum of %u\n",
 >   			   txq_entries);
 >
 > -	return efx_realloc_channels(efx, ring->rx_pending, txq_entries);
 > +	/* Apply the new settings */
 > +	efx->rxq_entries = ring->rx_pending;
 > +	efx->txq_entries = ring->tx_pending;
 > +
 > +	/* Update the datapath with the new settings if the interface is up */
 > +	if (!efx_check_disabled(efx) && netif_running(efx->net_dev)) {
 > +		dev_close(net_dev);
 > +		rc = dev_open(net_dev, NULL);
 > +	}
 > +
 > +	return rc;
 >   }
 >
 >   static void efx_ethtool_get_wol(struct net_device *net_dev,
Martin Habets April 4, 2022, 10:36 a.m. UTC | #4
Hi Taehee,

On Sat, Apr 02, 2022 at 03:48:14AM +0900, Taehee Yoo wrote:
> On 4/1/22 20:06, Martin Habets wrote:
> 
> Hi Martin,
> Thank you so much for your review!
> 
> > Hi Taehee,
> >
> > Thanks for looking into this. Unfortunately efx_realloc_channels()
> > has turned out to be quite fragile over the years, so I'm
> > keen to remove it in stead of patching it up all the time.
> 
> I agree with you.
> efx_realloc_channels() is too complex.
> 
> >
> > Could you try the patch below please?
> > If it works ok for you as well we'll be able to remove
> > efx_realloc_channels(). The added advantage of this approach
> > is that the netdev notifiers get informed of the change.
> 
> I tested your patch and I found a page reference count problem.
> How to test:
> 1. set up XDP_TX
> 2. traffic on
> 3. traffic off
> 4. ring buffer size change
> 5. loop from 2 to 4.
> 
> [   87.836195][   T72] BUG: Bad page state in process kworker/u16:1
> pfn:125445
> [   87.843356][   T72] page:000000003725f642 refcount:-2 mapcount:0
> mapping:0000000000000000 index:0x0 pfn:0x125445
> [   87.853783][   T72] flags: 0x200000000000000(node=0|zone=2)
> 
> [   87.859391][   T72] raw: 0200000000000000 dead000000000100
> dead000000000122 0000000000000000
> [   87.867928][   T72] raw: 0000000000000000 0000000000000000
> fffffffeffffffff 0000000000000000
> [   87.876569][   T72] page dumped because: nonzero _refcount
> 
> [   87.882125][   T72] Modules linked in: af_packet sfc ixgbe mtd atlantic
> coretemp mdio hwmon sch_fq_codel msr bpf_prelx
> [   87.895331][   T72] CPU: 0 PID: 72 Comm: kworker/u16:1 Not tainted
> 5.17.0+ #62 dbf33652f22e5147659e7e2472bb962779c4833
> [   87.906350][   T72] Hardware name: ASUS System Product Name/PRIME Z690-P
> D4, BIOS 0603 11/01/2021
> [   87.915360][   T72] Workqueue: netns cleanup_net
> 
> [   87.920087][   T72] Call Trace:
> 
> [   87.923311][   T72]  <TASK>
> 
> [   87.926188][   T72]  dump_stack_lvl+0x56/0x7b
> 
> [   87.930597][   T72]  bad_page.cold.125+0x63/0x93
> 
> [   87.935288][   T72]  free_pcppages_bulk+0x63c/0x6f0
> 
> [   87.940232][   T72]  free_unref_page+0x8b/0xf0
> 
> [   87.944749][   T72]  efx_fini_rx_queue+0x15f/0x210 [sfc
> 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e]

Looks to me like this is in efx_fini_rx_recycle_ring().
It could be a side effect of the memory leak you report below.
If this is in efx_fini_rx_recycle_ring() I'll post a patch for
that soon on a separate thread.

> [   87.953756][   T72]  efx_stop_channels+0xef/0x1b0 [sfc
> 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e]
> [   87.962699][   T72]  efx_net_stop+0x4d/0x60 [sfc
> 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e]
> [   87.971029][   T72]  __dev_close_many+0x8b/0xf0
> 
> [   87.975618][   T72]  dev_close_many+0x7d/0x120
> 
> [ ... ]
> 
> 
> In addition, I would like to share issues that I'm currently looking into:
> 1. TX DMA error
> when interface down/up or ring buffer size changes, TX DMA error would occur
> because tx_queue can be used before initialization.
> But It will be fixed by the below patch.
> 
>  static void efx_ethtool_get_wol(struct net_device *net_dev,
> diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
> index d16e031e95f4..6983799e1c05 100644
> --- a/drivers/net/ethernet/sfc/tx.c
> +++ b/drivers/net/ethernet/sfc/tx.c
> @@ -443,6 +443,9 @@ int efx_xdp_tx_buffers(struct efx_nic *efx, int n,
> struct xdp_frame **xdpfs,
>         if (unlikely(!tx_queue))
>                 return -EINVAL;
> 
> +       if (!tx_queue->initialised)
> +               return -EINVAL;
> +
>         if (efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED)
>                 HARD_TX_LOCK(efx->net_dev, tx_queue->core_txq, cpu);
> 
> diff --git a/drivers/net/ethernet/sfc/tx_common.c
> b/drivers/net/ethernet/sfc/tx_common.c
> index d530cde2b864..9bc8281b7f5b 100644
> --- a/drivers/net/ethernet/sfc/tx_common.c
> +++ b/drivers/net/ethernet/sfc/tx_common.c
> @@ -101,6 +101,8 @@ void efx_fini_tx_queue(struct efx_tx_queue *tx_queue)
>         netif_dbg(tx_queue->efx, drv, tx_queue->efx->net_dev,
>                   "shutting down TX queue %d\n", tx_queue->queue);
> 
> +       tx_queue->initialised = false;
> +
>         if (!tx_queue->buffer)
>                 return;

Looks ok, but xmit_hard should never be called on an interface that
is down. Makes me wonder if we have a seqence issue in our ndo_stop API.

> 
> After your patch, unfortunately, it can't fix ring buffer size change case.
> It can fix only interface down/up case.
> I will look into this more.
> 
> 2. Memory leak
> There is a memory leak in ring buffer size change logic.
> reproducer:
>    while :
>    do
>        ethtool -G <interface name> rx 2048 tx 2048
>        ethtool -G <interface name> rx 1024 tx 1024
>    done

Is this with my patch or only with yours?
Thanks a lot for testing this.

Martin

> Thanks a lot,
> Taehee Yoo
> 
> >
> > Regards,
> > Martin Habets <habetsm.xilinx@gmail.com>
> >
> > ---
> >   drivers/net/ethernet/sfc/ethtool.c |   13 ++++++++++++-
> >   1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/sfc/ethtool.c
> b/drivers/net/ethernet/sfc/ethtool.c
> > index 48506373721a..8cfbe61737bb 100644
> > --- a/drivers/net/ethernet/sfc/ethtool.c
> > +++ b/drivers/net/ethernet/sfc/ethtool.c
> > @@ -179,6 +179,7 @@ efx_ethtool_set_ringparam(struct net_device *net_dev,
> >   {
> >   	struct efx_nic *efx = netdev_priv(net_dev);
> >   	u32 txq_entries;
> > +	int rc = 0;
> >
> >   	if (ring->rx_mini_pending || ring->rx_jumbo_pending ||
> >   	    ring->rx_pending > EFX_MAX_DMAQ_SIZE ||
> > @@ -198,7 +199,17 @@ efx_ethtool_set_ringparam(struct net_device *net_dev,
> >   			   "increasing TX queue size to minimum of %u\n",
> >   			   txq_entries);
> >
> > -	return efx_realloc_channels(efx, ring->rx_pending, txq_entries);
> > +	/* Apply the new settings */
> > +	efx->rxq_entries = ring->rx_pending;
> > +	efx->txq_entries = ring->tx_pending;
> > +
> > +	/* Update the datapath with the new settings if the interface is up */
> > +	if (!efx_check_disabled(efx) && netif_running(efx->net_dev)) {
> > +		dev_close(net_dev);
> > +		rc = dev_open(net_dev, NULL);
> > +	}
> > +
> > +	return rc;
> >   }
> >
> >   static void efx_ethtool_get_wol(struct net_device *net_dev,
Taehee Yoo April 5, 2022, 3:16 a.m. UTC | #5
On 4/4/22 19:36, Martin Habets wrote:

Hi Martin,

 > Hi Taehee,
 >
 > On Sat, Apr 02, 2022 at 03:48:14AM +0900, Taehee Yoo wrote:
 >> On 4/1/22 20:06, Martin Habets wrote:
 >>
 >> Hi Martin,
 >> Thank you so much for your review!
 >>
 >>> Hi Taehee,
 >>>
 >>> Thanks for looking into this. Unfortunately efx_realloc_channels()
 >>> has turned out to be quite fragile over the years, so I'm
 >>> keen to remove it in stead of patching it up all the time.
 >>
 >> I agree with you.
 >> efx_realloc_channels() is too complex.
 >>
 >>>
 >>> Could you try the patch below please?
 >>> If it works ok for you as well we'll be able to remove
 >>> efx_realloc_channels(). The added advantage of this approach
 >>> is that the netdev notifiers get informed of the change.
 >>
 >> I tested your patch and I found a page reference count problem.
 >> How to test:
 >> 1. set up XDP_TX
 >> 2. traffic on
 >> 3. traffic off
 >> 4. ring buffer size change
 >> 5. loop from 2 to 4.
 >>
 >> [   87.836195][   T72] BUG: Bad page state in process kworker/u16:1
 >> pfn:125445
 >> [   87.843356][   T72] page:000000003725f642 refcount:-2 mapcount:0
 >> mapping:0000000000000000 index:0x0 pfn:0x125445
 >> [   87.853783][   T72] flags: 0x200000000000000(node=0|zone=2)
 >>
 >> [   87.859391][   T72] raw: 0200000000000000 dead000000000100
 >> dead000000000122 0000000000000000
 >> [   87.867928][   T72] raw: 0000000000000000 0000000000000000
 >> fffffffeffffffff 0000000000000000
 >> [   87.876569][   T72] page dumped because: nonzero _refcount
 >>
 >> [   87.882125][   T72] Modules linked in: af_packet sfc ixgbe mtd 
atlantic
 >> coretemp mdio hwmon sch_fq_codel msr bpf_prelx
 >> [   87.895331][   T72] CPU: 0 PID: 72 Comm: kworker/u16:1 Not tainted
 >> 5.17.0+ #62 dbf33652f22e5147659e7e2472bb962779c4833
 >> [   87.906350][   T72] Hardware name: ASUS System Product Name/PRIME 
Z690-P
 >> D4, BIOS 0603 11/01/2021
 >> [   87.915360][   T72] Workqueue: netns cleanup_net
 >>
 >> [   87.920087][   T72] Call Trace:
 >>
 >> [   87.923311][   T72]  <TASK>
 >>
 >> [   87.926188][   T72]  dump_stack_lvl+0x56/0x7b
 >>
 >> [   87.930597][   T72]  bad_page.cold.125+0x63/0x93
 >>
 >> [   87.935288][   T72]  free_pcppages_bulk+0x63c/0x6f0
 >>
 >> [   87.940232][   T72]  free_unref_page+0x8b/0xf0
 >>
 >> [   87.944749][   T72]  efx_fini_rx_queue+0x15f/0x210 [sfc
 >> 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e]
 >
 > Looks to me like this is in efx_fini_rx_recycle_ring().
 > It could be a side effect of the memory leak you report below.
 > If this is in efx_fini_rx_recycle_ring() I'll post a patch for
 > that soon on a separate thread.

Thanks for that!

 >
 >> [   87.953756][   T72]  efx_stop_channels+0xef/0x1b0 [sfc
 >> 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e]
 >> [   87.962699][   T72]  efx_net_stop+0x4d/0x60 [sfc
 >> 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e]
 >> [   87.971029][   T72]  __dev_close_many+0x8b/0xf0
 >>
 >> [   87.975618][   T72]  dev_close_many+0x7d/0x120
 >>
 >> [ ... ]
 >>
 >>
 >> In addition, I would like to share issues that I'm currently looking 
into:
 >> 1. TX DMA error
 >> when interface down/up or ring buffer size changes, TX DMA error 
would occur
 >> because tx_queue can be used before initialization.
 >> But It will be fixed by the below patch.
 >>
 >>   static void efx_ethtool_get_wol(struct net_device *net_dev,
 >> diff --git a/drivers/net/ethernet/sfc/tx.c 
b/drivers/net/ethernet/sfc/tx.c
 >> index d16e031e95f4..6983799e1c05 100644
 >> --- a/drivers/net/ethernet/sfc/tx.c
 >> +++ b/drivers/net/ethernet/sfc/tx.c
 >> @@ -443,6 +443,9 @@ int efx_xdp_tx_buffers(struct efx_nic *efx, int n,
 >> struct xdp_frame **xdpfs,
 >>          if (unlikely(!tx_queue))
 >>                  return -EINVAL;
 >>
 >> +       if (!tx_queue->initialised)
 >> +               return -EINVAL;
 >> +
 >>          if (efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED)
 >>                  HARD_TX_LOCK(efx->net_dev, tx_queue->core_txq, cpu);
 >>
 >> diff --git a/drivers/net/ethernet/sfc/tx_common.c
 >> b/drivers/net/ethernet/sfc/tx_common.c
 >> index d530cde2b864..9bc8281b7f5b 100644
 >> --- a/drivers/net/ethernet/sfc/tx_common.c
 >> +++ b/drivers/net/ethernet/sfc/tx_common.c
 >> @@ -101,6 +101,8 @@ void efx_fini_tx_queue(struct efx_tx_queue 
*tx_queue)
 >>          netif_dbg(tx_queue->efx, drv, tx_queue->efx->net_dev,
 >>                    "shutting down TX queue %d\n", tx_queue->queue);
 >>
 >> +       tx_queue->initialised = false;
 >> +
 >>          if (!tx_queue->buffer)
 >>                  return;
 >
 > Looks ok, but xmit_hard should never be called on an interface that
 > is down. Makes me wonder if we have a seqence issue in our ndo_stop API.
 >

I think it's an XDP tx_queue specific issue.
Sorry that I didn't provide that information.
When a packet is received and it acts XDP_TX, it calls 
efx_xdp_tx_buffers() and it uses xdp tx_queue and sends it to hardware 
directly.
So, ->ndo_start_xmit is not called.
And that bug occurs when the first interface up.
->ndo_stop has never called, So I think this is not an issue of ->ndo_stop.

 >>
 >> After your patch, unfortunately, it can't fix ring buffer size 
change case.
 >> It can fix only interface down/up case.
 >> I will look into this more.
 >>
 >> 2. Memory leak
 >> There is a memory leak in ring buffer size change logic.
 >> reproducer:
 >>     while :
 >>     do
 >>         ethtool -G <interface name> rx 2048 tx 2048
 >>         ethtool -G <interface name> rx 1024 tx 1024
 >>     done
 >
 > Is this with my patch or only with yours?
 > Thanks a lot for testing this.
 >

Memory leak still occurs with your patch(do not use efx_realloc_channels())

Thanks a lot,
Taehee Yoo

 > Martin
 >
 >> Thanks a lot,
 >> Taehee Yoo
 >>
 >>>
 >>> Regards,
 >>> Martin Habets <habetsm.xilinx@gmail.com>
 >>>
 >>> ---
 >>>    drivers/net/ethernet/sfc/ethtool.c |   13 ++++++++++++-
 >>>    1 file changed, 12 insertions(+), 1 deletion(-)
 >>>
 >>> diff --git a/drivers/net/ethernet/sfc/ethtool.c
 >> b/drivers/net/ethernet/sfc/ethtool.c
 >>> index 48506373721a..8cfbe61737bb 100644
 >>> --- a/drivers/net/ethernet/sfc/ethtool.c
 >>> +++ b/drivers/net/ethernet/sfc/ethtool.c
 >>> @@ -179,6 +179,7 @@ efx_ethtool_set_ringparam(struct net_device 
*net_dev,
 >>>    {
 >>>    	struct efx_nic *efx = netdev_priv(net_dev);
 >>>    	u32 txq_entries;
 >>> +	int rc = 0;
 >>>
 >>>    	if (ring->rx_mini_pending || ring->rx_jumbo_pending ||
 >>>    	    ring->rx_pending > EFX_MAX_DMAQ_SIZE ||
 >>> @@ -198,7 +199,17 @@ efx_ethtool_set_ringparam(struct net_device 
*net_dev,
 >>>    			   "increasing TX queue size to minimum of %u\n",
 >>>    			   txq_entries);
 >>>
 >>> -	return efx_realloc_channels(efx, ring->rx_pending, txq_entries);
 >>> +	/* Apply the new settings */
 >>> +	efx->rxq_entries = ring->rx_pending;
 >>> +	efx->txq_entries = ring->tx_pending;
 >>> +
 >>> +	/* Update the datapath with the new settings if the interface is 
up */
 >>> +	if (!efx_check_disabled(efx) && netif_running(efx->net_dev)) {
 >>> +		dev_close(net_dev);
 >>> +		rc = dev_open(net_dev, NULL);
 >>> +	}
 >>> +
 >>> +	return rc;
 >>>    }
 >>>
 >>>    static void efx_ethtool_get_wol(struct net_device *net_dev,
 >
diff mbox series

Patch

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index d6fdcdc530ca..b29ca69aeca5 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -789,6 +789,85 @@  void efx_remove_channels(struct efx_nic *efx)
 	kfree(efx->xdp_tx_queues);
 }
 
+static int efx_set_xdp_tx_queue(struct efx_nic *efx, int xdp_queue_number,
+				struct efx_tx_queue *tx_queue)
+{
+	if (xdp_queue_number >= efx->xdp_tx_queue_count)
+		return -EINVAL;
+
+	netif_dbg(efx, drv, efx->net_dev,
+		  "Channel %u TXQ %u is XDP %u, HW %u\n",
+		  tx_queue->channel->channel, tx_queue->label,
+		  xdp_queue_number, tx_queue->queue);
+	efx->xdp_tx_queues[xdp_queue_number] = tx_queue;
+	return 0;
+}
+
+static void efx_set_xdp_channels(struct efx_nic *efx)
+{
+	struct efx_tx_queue *tx_queue;
+	struct efx_channel *channel;
+	unsigned int next_queue = 0;
+	int xdp_queue_number = 0;
+	int rc;
+
+	/* We need to mark which channels really have RX and TX
+	 * queues, and adjust the TX queue numbers if we have separate
+	 * RX-only and TX-only channels.
+	 */
+	efx_for_each_channel(channel, efx) {
+		if (channel->channel < efx->tx_channel_offset)
+			continue;
+
+		if (efx_channel_is_xdp_tx(channel)) {
+			efx_for_each_channel_tx_queue(tx_queue, channel) {
+				tx_queue->queue = next_queue++;
+				rc = efx_set_xdp_tx_queue(efx, xdp_queue_number,
+							  tx_queue);
+				if (rc == 0)
+					xdp_queue_number++;
+			}
+		} else {
+			efx_for_each_channel_tx_queue(tx_queue, channel) {
+				tx_queue->queue = next_queue++;
+				netif_dbg(efx, drv, efx->net_dev,
+					  "Channel %u TXQ %u is HW %u\n",
+					  channel->channel, tx_queue->label,
+					  tx_queue->queue);
+			}
+
+			/* If XDP is borrowing queues from net stack, it must
+			 * use the queue with no csum offload, which is the
+			 * first one of the channel
+			 * (note: tx_queue_by_type is not initialized yet)
+			 */
+			if (efx->xdp_txq_queues_mode ==
+			    EFX_XDP_TX_QUEUES_BORROWED) {
+				tx_queue = &channel->tx_queue[0];
+				rc = efx_set_xdp_tx_queue(efx, xdp_queue_number,
+							  tx_queue);
+				if (rc == 0)
+					xdp_queue_number++;
+			}
+		}
+	}
+	WARN_ON(efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_DEDICATED &&
+		xdp_queue_number != efx->xdp_tx_queue_count);
+	WARN_ON(efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED &&
+		xdp_queue_number > efx->xdp_tx_queue_count);
+
+	/* If we have more CPUs than assigned XDP TX queues, assign the already
+	 * existing queues to the exceeding CPUs
+	 */
+	next_queue = 0;
+	while (xdp_queue_number < efx->xdp_tx_queue_count) {
+		tx_queue = efx->xdp_tx_queues[next_queue++];
+		rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue);
+		if (rc == 0)
+			xdp_queue_number++;
+	}
+}
+
 int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries)
 {
 	struct efx_channel *other_channel[EFX_MAX_CHANNELS], *channel;
@@ -860,6 +939,7 @@  int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries)
 		efx_init_napi_channel(efx->channel[i]);
 	}
 
+	efx_set_xdp_channels(efx);
 out:
 	/* Destroy unused channel structures */
 	for (i = 0; i < efx->n_channels; i++) {
@@ -892,26 +972,9 @@  int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries)
 	goto out;
 }
 
-static inline int
-efx_set_xdp_tx_queue(struct efx_nic *efx, int xdp_queue_number,
-		     struct efx_tx_queue *tx_queue)
-{
-	if (xdp_queue_number >= efx->xdp_tx_queue_count)
-		return -EINVAL;
-
-	netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is XDP %u, HW %u\n",
-		  tx_queue->channel->channel, tx_queue->label,
-		  xdp_queue_number, tx_queue->queue);
-	efx->xdp_tx_queues[xdp_queue_number] = tx_queue;
-	return 0;
-}
-
 int efx_set_channels(struct efx_nic *efx)
 {
-	struct efx_tx_queue *tx_queue;
 	struct efx_channel *channel;
-	unsigned int next_queue = 0;
-	int xdp_queue_number;
 	int rc;
 
 	efx->tx_channel_offset =
@@ -929,61 +992,14 @@  int efx_set_channels(struct efx_nic *efx)
 			return -ENOMEM;
 	}
 
-	/* We need to mark which channels really have RX and TX
-	 * queues, and adjust the TX queue numbers if we have separate
-	 * RX-only and TX-only channels.
-	 */
-	xdp_queue_number = 0;
 	efx_for_each_channel(channel, efx) {
 		if (channel->channel < efx->n_rx_channels)
 			channel->rx_queue.core_index = channel->channel;
 		else
 			channel->rx_queue.core_index = -1;
-
-		if (channel->channel >= efx->tx_channel_offset) {
-			if (efx_channel_is_xdp_tx(channel)) {
-				efx_for_each_channel_tx_queue(tx_queue, channel) {
-					tx_queue->queue = next_queue++;
-					rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue);
-					if (rc == 0)
-						xdp_queue_number++;
-				}
-			} else {
-				efx_for_each_channel_tx_queue(tx_queue, channel) {
-					tx_queue->queue = next_queue++;
-					netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is HW %u\n",
-						  channel->channel, tx_queue->label,
-						  tx_queue->queue);
-				}
-
-				/* If XDP is borrowing queues from net stack, it must use the queue
-				 * with no csum offload, which is the first one of the channel
-				 * (note: channel->tx_queue_by_type is not initialized yet)
-				 */
-				if (efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_BORROWED) {
-					tx_queue = &channel->tx_queue[0];
-					rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue);
-					if (rc == 0)
-						xdp_queue_number++;
-				}
-			}
-		}
 	}
-	WARN_ON(efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_DEDICATED &&
-		xdp_queue_number != efx->xdp_tx_queue_count);
-	WARN_ON(efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED &&
-		xdp_queue_number > efx->xdp_tx_queue_count);
 
-	/* If we have more CPUs than assigned XDP TX queues, assign the already
-	 * existing queues to the exceeding CPUs
-	 */
-	next_queue = 0;
-	while (xdp_queue_number < efx->xdp_tx_queue_count) {
-		tx_queue = efx->xdp_tx_queues[next_queue++];
-		rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue);
-		if (rc == 0)
-			xdp_queue_number++;
-	}
+	efx_set_xdp_channels(efx);
 
 	rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);
 	if (rc)