diff mbox series

[net,2/2] sfc: do not initialize non existing queues with efx_separate_tx_channels

Message ID 20220511125941.55812-3-ihuguet@redhat.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series sfc: fix some efx_separate_tx_channels errors | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 30 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Íñigo Huguet May 11, 2022, 12:59 p.m. UTC
If efx_separate_tx_channels is used, some error messages and backtraces
are shown in the logs (see below). This is because during channels
start, all queues in the channels are init asumming that they exist, but
they might not if efx_separate_tx_channels is used: some channels only
have RX queues and others only have TX queues.

Avoid that by checking if the channel has TX, RX or both queues.
However, even with this patch the NIC is unusable when using
efx_separate_tx_channels, so there are more problems that I've not
identified. These messages are still shown at probe time many times:
 sfc 0000:03:00.0 (unnamed net_device) (uninitialized): MC command 0x92 inlen 8 failed rc=-71 (raw=0) arg=0
 sfc 0000:03:00.0 (unnamed net_device) (uninitialized): failed to link VI 4294967295 to PIO buffer 1 (-71)

Those messages were also shown before these patch.

And then this other message and backtrace were also shown many times,
but now they're not:
 sfc 0000:03:00.0 ens6f0np0: MC command 0x82 inlen 544 failed rc=-22 (raw=0) arg=0
 ------------[ cut here ]------------
 netdevice: ens6f0np0: failed to initialise TXQ -1
 WARNING: CPU: 1 PID: 626 at drivers/net/ethernet/sfc/ef10.c:2393 efx_ef10_tx_init+0x201/0x300 [sfc]
 [...] stripped
 RIP: 0010:efx_ef10_tx_init+0x201/0x300 [sfc]
 [...] stripped
 Call Trace:
  efx_init_tx_queue+0xaa/0xf0 [sfc]
  efx_start_channels+0x49/0x120 [sfc]
  efx_start_all+0x1f8/0x430 [sfc]
  efx_net_open+0x5a/0xe0 [sfc]
  __dev_open+0xd0/0x190
  __dev_change_flags+0x1b3/0x220
  dev_change_flags+0x21/0x60
 [...]

At remove time, these messages were shown. Now they're neither shown:
 sfc 0000:03:00.0 ens6f0np0: failed to flush 10 queues
 sfc 0000:03:00.0 ens6f0np0: failed to flush queues

Fixes: 7ec3de426014 ("sfc: move datapath management code")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
---
 drivers/net/ethernet/sfc/efx_channels.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

Comments

Martin Habets May 13, 2022, 11:07 a.m. UTC | #1
On Wed, May 11, 2022 at 02:59:41PM +0200, Íñigo Huguet wrote:
> If efx_separate_tx_channels is used, some error messages and backtraces
> are shown in the logs (see below). This is because during channels
> start, all queues in the channels are init asumming that they exist, but
> they might not if efx_separate_tx_channels is used: some channels only
> have RX queues and others only have TX queues.

Thanks for reporting this. At first glance I suspect there may be more callers
of efx_for_each_channel_tx_queue() which is why it is not yet working for you
even with this fix.
Probably we need to fix those macros themselves.

I'm having a closer look, but it will take some time.

Martin

> 
> Avoid that by checking if the channel has TX, RX or both queues.
> However, even with this patch the NIC is unusable when using
> efx_separate_tx_channels, so there are more problems that I've not
> identified. These messages are still shown at probe time many times:
>  sfc 0000:03:00.0 (unnamed net_device) (uninitialized): MC command 0x92 inlen 8 failed rc=-71 (raw=0) arg=0
>  sfc 0000:03:00.0 (unnamed net_device) (uninitialized): failed to link VI 4294967295 to PIO buffer 1 (-71)
> 
> Those messages were also shown before these patch.
> 
> And then this other message and backtrace were also shown many times,
> but now they're not:
>  sfc 0000:03:00.0 ens6f0np0: MC command 0x82 inlen 544 failed rc=-22 (raw=0) arg=0
>  ------------[ cut here ]------------
>  netdevice: ens6f0np0: failed to initialise TXQ -1
>  WARNING: CPU: 1 PID: 626 at drivers/net/ethernet/sfc/ef10.c:2393 efx_ef10_tx_init+0x201/0x300 [sfc]
>  [...] stripped
>  RIP: 0010:efx_ef10_tx_init+0x201/0x300 [sfc]
>  [...] stripped
>  Call Trace:
>   efx_init_tx_queue+0xaa/0xf0 [sfc]
>   efx_start_channels+0x49/0x120 [sfc]
>   efx_start_all+0x1f8/0x430 [sfc]
>   efx_net_open+0x5a/0xe0 [sfc]
>   __dev_open+0xd0/0x190
>   __dev_change_flags+0x1b3/0x220
>   dev_change_flags+0x21/0x60
>  [...]
> 
> At remove time, these messages were shown. Now they're neither shown:
>  sfc 0000:03:00.0 ens6f0np0: failed to flush 10 queues
>  sfc 0000:03:00.0 ens6f0np0: failed to flush queues
> 
> Fixes: 7ec3de426014 ("sfc: move datapath management code")
> Reported-by: Tianhao Zhao <tizhao@redhat.com>
> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
> ---
>  drivers/net/ethernet/sfc/efx_channels.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
> index da2db6791907..b6b960e2021c 100644
> --- a/drivers/net/ethernet/sfc/efx_channels.c
> +++ b/drivers/net/ethernet/sfc/efx_channels.c
> @@ -1139,17 +1139,21 @@ void efx_start_channels(struct efx_nic *efx)
>  	struct efx_channel *channel;
>  
>  	efx_for_each_channel_rev(channel, efx) {
> -		efx_for_each_channel_tx_queue(tx_queue, channel) {
> -			efx_init_tx_queue(tx_queue);
> -			atomic_inc(&efx->active_queues);
> +		if (channel->channel >= efx->tx_channel_offset) {
> +			efx_for_each_channel_tx_queue(tx_queue, channel) {
> +				efx_init_tx_queue(tx_queue);
> +				atomic_inc(&efx->active_queues);
> +			}
>  		}
>  
> -		efx_for_each_channel_rx_queue(rx_queue, channel) {
> -			efx_init_rx_queue(rx_queue);
> -			atomic_inc(&efx->active_queues);
> -			efx_stop_eventq(channel);
> -			efx_fast_push_rx_descriptors(rx_queue, false);
> -			efx_start_eventq(channel);
> +		if (channel->channel < efx->n_rx_channels) {
> +			efx_for_each_channel_rx_queue(rx_queue, channel) {
> +				efx_init_rx_queue(rx_queue);
> +				atomic_inc(&efx->active_queues);
> +				efx_stop_eventq(channel);
> +				efx_fast_push_rx_descriptors(rx_queue, false);
> +				efx_start_eventq(channel);
> +			}
>  		}
>  
>  		WARN_ON(channel->rx_pkt_n_frags);
> -- 
> 2.34.1
Martin Habets May 13, 2022, 12:37 p.m. UTC | #2
On Fri, May 13, 2022 at 12:07:23PM +0100, Martin Habets wrote:
> On Wed, May 11, 2022 at 02:59:41PM +0200, Íñigo Huguet wrote:
> > If efx_separate_tx_channels is used, some error messages and backtraces
> > are shown in the logs (see below). This is because during channels
> > start, all queues in the channels are init asumming that they exist, but
> > they might not if efx_separate_tx_channels is used: some channels only
> > have RX queues and others only have TX queues.
> 
> Thanks for reporting this. At first glance I suspect there may be more callers
> of efx_for_each_channel_tx_queue() which is why it is not yet working for you
> even with this fix.
> Probably we need to fix those macros themselves.
> 
> I'm having a closer look, but it will take some time.

It was easier than I thought. With the patch below I do not get any errors,
and ping works. I did not have to touch efx_for_each_channel_rx_queue().
Can you give this a try and report if it works for you?

Martin
---
 drivers/net/ethernet/sfc/net_driver.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 318db906a154..723bbeea5d0c 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1530,7 +1530,7 @@ static inline bool efx_channel_is_xdp_tx(struct efx_channel *channel)
 
 static inline bool efx_channel_has_tx_queues(struct efx_channel *channel)
 {
-	return true;
+	return channel && channel->channel >= channel->efx->tx_channel_offset;
 }
 
 static inline unsigned int efx_channel_num_tx_queues(struct efx_channel *channel)
Íñigo Huguet May 24, 2022, 3:36 p.m. UTC | #3
On Fri, May 13, 2022 at 2:37 PM Martin Habets <habetsm.xilinx@gmail.com> wrote:
>
> On Fri, May 13, 2022 at 12:07:23PM +0100, Martin Habets wrote:
> > On Wed, May 11, 2022 at 02:59:41PM +0200, Íñigo Huguet wrote:
> > > If efx_separate_tx_channels is used, some error messages and backtraces
> > > are shown in the logs (see below). This is because during channels
> > > start, all queues in the channels are init asumming that they exist, but
> > > they might not if efx_separate_tx_channels is used: some channels only
> > > have RX queues and others only have TX queues.
> >
> > Thanks for reporting this. At first glance I suspect there may be more callers
> > of efx_for_each_channel_tx_queue() which is why it is not yet working for you
> > even with this fix.
> > Probably we need to fix those macros themselves.
> >
> > I'm having a closer look, but it will take some time.
>
> It was easier than I thought. With the patch below I do not get any errors,
> and ping works. I did not have to touch efx_for_each_channel_rx_queue().
> Can you give this a try and report if it works for you?

Martin, this is working fine for me. Module loads and unloads without
errors, and I can ping and run an iperf3 test also without errors.

How do you want to do it? Should I send this patch on your behalf
within my patch series? Or do you want to send it yourself first?

>
> Martin
> ---
>  drivers/net/ethernet/sfc/net_driver.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
> index 318db906a154..723bbeea5d0c 100644
> --- a/drivers/net/ethernet/sfc/net_driver.h
> +++ b/drivers/net/ethernet/sfc/net_driver.h
> @@ -1530,7 +1530,7 @@ static inline bool efx_channel_is_xdp_tx(struct efx_channel *channel)
>
>  static inline bool efx_channel_has_tx_queues(struct efx_channel *channel)
>  {
> -       return true;
> +       return channel && channel->channel >= channel->efx->tx_channel_offset;
>  }
>
>  static inline unsigned int efx_channel_num_tx_queues(struct efx_channel *channel)
>
Martin Habets May 27, 2022, 6:14 a.m. UTC | #4
On Tue, May 24, 2022 at 05:36:49PM +0200, Íñigo Huguet wrote:
> On Fri, May 13, 2022 at 2:37 PM Martin Habets <habetsm.xilinx@gmail.com> wrote:
> >
> > On Fri, May 13, 2022 at 12:07:23PM +0100, Martin Habets wrote:
> > > On Wed, May 11, 2022 at 02:59:41PM +0200, Íñigo Huguet wrote:
> > > > If efx_separate_tx_channels is used, some error messages and backtraces
> > > > are shown in the logs (see below). This is because during channels
> > > > start, all queues in the channels are init asumming that they exist, but
> > > > they might not if efx_separate_tx_channels is used: some channels only
> > > > have RX queues and others only have TX queues.
> > >
> > > Thanks for reporting this. At first glance I suspect there may be more callers
> > > of efx_for_each_channel_tx_queue() which is why it is not yet working for you
> > > even with this fix.
> > > Probably we need to fix those macros themselves.
> > >
> > > I'm having a closer look, but it will take some time.
> >
> > It was easier than I thought. With the patch below I do not get any errors,
> > and ping works. I did not have to touch efx_for_each_channel_rx_queue().
> > Can you give this a try and report if it works for you?
> 
> Martin, this is working fine for me. Module loads and unloads without
> errors, and I can ping and run an iperf3 test also without errors.
> 
> How do you want to do it? Should I send this patch on your behalf
> within my patch series? Or do you want to send it yourself first?

IMO you did the hard work so I did not want to steal your thunder.
Please send it as part of your series, I'll Ack it.

Martin

> >
> > Martin
> > ---
> >  drivers/net/ethernet/sfc/net_driver.h |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
> > index 318db906a154..723bbeea5d0c 100644
> > --- a/drivers/net/ethernet/sfc/net_driver.h
> > +++ b/drivers/net/ethernet/sfc/net_driver.h
> > @@ -1530,7 +1530,7 @@ static inline bool efx_channel_is_xdp_tx(struct efx_channel *channel)
> >
> >  static inline bool efx_channel_has_tx_queues(struct efx_channel *channel)
> >  {
> > -       return true;
> > +       return channel && channel->channel >= channel->efx->tx_channel_offset;
> >  }
> >
> >  static inline unsigned int efx_channel_num_tx_queues(struct efx_channel *channel)
> >
> 
> 
> -- 
> Íñigo Huguet
diff mbox series

Patch

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index da2db6791907..b6b960e2021c 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -1139,17 +1139,21 @@  void efx_start_channels(struct efx_nic *efx)
 	struct efx_channel *channel;
 
 	efx_for_each_channel_rev(channel, efx) {
-		efx_for_each_channel_tx_queue(tx_queue, channel) {
-			efx_init_tx_queue(tx_queue);
-			atomic_inc(&efx->active_queues);
+		if (channel->channel >= efx->tx_channel_offset) {
+			efx_for_each_channel_tx_queue(tx_queue, channel) {
+				efx_init_tx_queue(tx_queue);
+				atomic_inc(&efx->active_queues);
+			}
 		}
 
-		efx_for_each_channel_rx_queue(rx_queue, channel) {
-			efx_init_rx_queue(rx_queue);
-			atomic_inc(&efx->active_queues);
-			efx_stop_eventq(channel);
-			efx_fast_push_rx_descriptors(rx_queue, false);
-			efx_start_eventq(channel);
+		if (channel->channel < efx->n_rx_channels) {
+			efx_for_each_channel_rx_queue(rx_queue, channel) {
+				efx_init_rx_queue(rx_queue);
+				atomic_inc(&efx->active_queues);
+				efx_stop_eventq(channel);
+				efx_fast_push_rx_descriptors(rx_queue, false);
+				efx_start_eventq(channel);
+			}
 		}
 
 		WARN_ON(channel->rx_pkt_n_frags);