diff mbox series

[net,1/3] net: mana: Fix TX CQE error handling

Message ID 1695519107-24139-2-git-send-email-haiyangz@microsoft.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: mana: Fix some TX processing bugs | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 9 this patch: 9
netdev/cc_maintainers fail 2 blamed authors not CCed: shacharr@microsoft.com stephen@networkplumber.org; 2 maintainers not CCed: shacharr@microsoft.com stephen@networkplumber.org
netdev/build_clang success Errors and warnings before: 1363 this patch: 1363
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1363 this patch: 1363
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 30 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Haiyang Zhang Sept. 24, 2023, 1:31 a.m. UTC
For an unknown TX CQE error type (probably from a newer hardware),
still free the SKB, update the queue tail, etc., otherwise the
accounting will be wrong.

Also, TX errors can be triggered by injecting corrupted packets, so
replace the WARN_ONCE to ratelimited error logging, because we don't
need stack trace here.

Cc: stable@vger.kernel.org
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

Comments

Simon Horman Sept. 29, 2023, 5:47 a.m. UTC | #1
On Sat, Sep 23, 2023 at 06:31:45PM -0700, Haiyang Zhang wrote:
> For an unknown TX CQE error type (probably from a newer hardware),
> still free the SKB, update the queue tail, etc., otherwise the
> accounting will be wrong.
> 
> Also, TX errors can be triggered by injecting corrupted packets, so
> replace the WARN_ONCE to ratelimited error logging, because we don't
> need stack trace here.
> 
> Cc: stable@vger.kernel.org
> Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>

Reviewed-by: Simon Horman <horms@kernel.org>
Simon Horman Sept. 29, 2023, 5:50 a.m. UTC | #2
On Fri, Sep 29, 2023 at 07:47:57AM +0200, Simon Horman wrote:
> On Sat, Sep 23, 2023 at 06:31:45PM -0700, Haiyang Zhang wrote:
> > For an unknown TX CQE error type (probably from a newer hardware),
> > still free the SKB, update the queue tail, etc., otherwise the
> > accounting will be wrong.
> > 
> > Also, TX errors can be triggered by injecting corrupted packets, so
> > replace the WARN_ONCE to ratelimited error logging, because we don't
> > need stack trace here.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
> > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> 
> Reviewed-by: Simon Horman <horms@kernel.org>

Sorry, one latent question.

The patch replaces WARN_ONCE with a net_ratelimit()'d netdev_err().
But I do wonder if, as a fix, netdev_err_once() would be more appropriate.
Haiyang Zhang Sept. 29, 2023, 3:51 p.m. UTC | #3
> -----Original Message-----
> From: Simon Horman <horms@kernel.org>
> Sent: Friday, September 29, 2023 1:51 AM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; Dexuan Cui
> <decui@microsoft.com>; KY Srinivasan <kys@microsoft.com>; Paul Rosswurm
> <paulros@microsoft.com>; olaf@aepfle.de; vkuznets
> <vkuznets@redhat.com>; davem@davemloft.net; wei.liu@kernel.org;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
> leon@kernel.org; Long Li <longli@microsoft.com>;
> ssengar@linux.microsoft.com; linux-rdma@vger.kernel.org;
> daniel@iogearbox.net; john.fastabend@gmail.com; bpf@vger.kernel.org;
> ast@kernel.org; Ajay Sharma <sharmaajay@microsoft.com>;
> hawk@kernel.org; tglx@linutronix.de; shradhagupta@linux.microsoft.com;
> linux-kernel@vger.kernel.org; stable@vger.kernel.org
> Subject: Re: [PATCH net, 1/3] net: mana: Fix TX CQE error handling
> 
> On Fri, Sep 29, 2023 at 07:47:57AM +0200, Simon Horman wrote:
> > On Sat, Sep 23, 2023 at 06:31:45PM -0700, Haiyang Zhang wrote:
> > > For an unknown TX CQE error type (probably from a newer hardware),
> > > still free the SKB, update the queue tail, etc., otherwise the
> > > accounting will be wrong.
> > >
> > > Also, TX errors can be triggered by injecting corrupted packets, so
> > > replace the WARN_ONCE to ratelimited error logging, because we don't
> > > need stack trace here.
> > >
> > > Cc: stable@vger.kernel.org
> > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure
> Network Adapter (MANA)")
> > > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> >
> > Reviewed-by: Simon Horman <horms@kernel.org>
> 
> Sorry, one latent question.
> 
> The patch replaces WARN_ONCE with a net_ratelimit()'d netdev_err().
> But I do wonder if, as a fix, netdev_err_once() would be more appropriate.

This error may happen with different CQE error types, so I use netdev_err() 
to display them, and added rate limit.

Thanks
- Haiyang
Simon Horman Sept. 30, 2023, 6:16 p.m. UTC | #4
On Fri, Sep 29, 2023 at 03:51:48PM +0000, Haiyang Zhang wrote:
> 
> 
> > -----Original Message-----
> > From: Simon Horman <horms@kernel.org>
> > Sent: Friday, September 29, 2023 1:51 AM
> > To: Haiyang Zhang <haiyangz@microsoft.com>
> > Cc: linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; Dexuan Cui
> > <decui@microsoft.com>; KY Srinivasan <kys@microsoft.com>; Paul Rosswurm
> > <paulros@microsoft.com>; olaf@aepfle.de; vkuznets
> > <vkuznets@redhat.com>; davem@davemloft.net; wei.liu@kernel.org;
> > edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
> > leon@kernel.org; Long Li <longli@microsoft.com>;
> > ssengar@linux.microsoft.com; linux-rdma@vger.kernel.org;
> > daniel@iogearbox.net; john.fastabend@gmail.com; bpf@vger.kernel.org;
> > ast@kernel.org; Ajay Sharma <sharmaajay@microsoft.com>;
> > hawk@kernel.org; tglx@linutronix.de; shradhagupta@linux.microsoft.com;
> > linux-kernel@vger.kernel.org; stable@vger.kernel.org
> > Subject: Re: [PATCH net, 1/3] net: mana: Fix TX CQE error handling
> > 
> > On Fri, Sep 29, 2023 at 07:47:57AM +0200, Simon Horman wrote:
> > > On Sat, Sep 23, 2023 at 06:31:45PM -0700, Haiyang Zhang wrote:
> > > > For an unknown TX CQE error type (probably from a newer hardware),
> > > > still free the SKB, update the queue tail, etc., otherwise the
> > > > accounting will be wrong.
> > > >
> > > > Also, TX errors can be triggered by injecting corrupted packets, so
> > > > replace the WARN_ONCE to ratelimited error logging, because we don't
> > > > need stack trace here.
> > > >
> > > > Cc: stable@vger.kernel.org
> > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure
> > Network Adapter (MANA)")
> > > > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> > >
> > > Reviewed-by: Simon Horman <horms@kernel.org>
> > 
> > Sorry, one latent question.
> > 
> > The patch replaces WARN_ONCE with a net_ratelimit()'d netdev_err().
> > But I do wonder if, as a fix, netdev_err_once() would be more appropriate.
> 
> This error may happen with different CQE error types, so I use netdev_err() 
> to display them, and added rate limit.

Thanks for the clarification.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 4a16ebff3d1d..5cdcf7561b38 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1317,19 +1317,23 @@  static void mana_poll_tx_cq(struct mana_cq *cq)
 		case CQE_TX_VPORT_IDX_OUT_OF_RANGE:
 		case CQE_TX_VPORT_DISABLED:
 		case CQE_TX_VLAN_TAGGING_VIOLATION:
-			WARN_ONCE(1, "TX: CQE error %d: ignored.\n",
-				  cqe_oob->cqe_hdr.cqe_type);
+			if (net_ratelimit())
+				netdev_err(ndev, "TX: CQE error %d\n",
+					   cqe_oob->cqe_hdr.cqe_type);
+
 			apc->eth_stats.tx_cqe_err++;
 			break;
 
 		default:
-			/* If the CQE type is unexpected, log an error, assert,
-			 * and go through the error path.
+			/* If the CQE type is unknown, log an error,
+			 * and still free the SKB, update tail, etc.
 			 */
-			WARN_ONCE(1, "TX: Unexpected CQE type %d: HW BUG?\n",
-				  cqe_oob->cqe_hdr.cqe_type);
+			if (net_ratelimit())
+				netdev_err(ndev, "TX: unknown CQE type %d\n",
+					   cqe_oob->cqe_hdr.cqe_type);
+
 			apc->eth_stats.tx_cqe_unknown_type++;
-			return;
+			break;
 		}
 
 		if (WARN_ON_ONCE(txq->gdma_txq_id != completions[i].wq_num))