diff mbox series

[net-next] ibmvnic: Return error code on TX scrq flush fail

Message ID 20240411203435.228559-1-nnac123@linux.ibm.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] ibmvnic: Return error code on TX scrq flush fail | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 11 maintainers not CCed: pabeni@redhat.com kuba@kernel.org tlfalcon@linux.ibm.com christophe.leroy@csgroup.eu naveen.n.rao@linux.ibm.com ricklind@linux.ibm.com edumazet@google.com npiggin@gmail.com linuxppc-dev@lists.ozlabs.org mpe@ellerman.id.au aneesh.kumar@kernel.org
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 5 this patch: 5
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-12--06-00 (tests: 962)

Commit Message

Nick Child April 11, 2024, 8:34 p.m. UTC
In ibmvnic_xmit() if ibmvnic_tx_scrq_flush() returns H_CLOSED then
it will inform upper level networking functions to disable tx
queues. H_CLOSED signals that the connection with the vnic server is
down and a transport event is expected to recover the device.

Previously, ibmvnic_tx_scrq_flush() was hard-coded to return success.
Therefore, the queues would remain active until ibmvnic_cleanup() is
called within do_reset().

The problem is that do_reset() depends on the RTNL lock. If several
ibmvnic devices are resetting then there can be a long wait time until
the last device can grab the lock. During this time the tx/rx queues
still appear active to upper level functions.

FYI, we do make a call to netif_carrier_off() outside the RTNL lock but
its calls to dev_deactivate() are also dependent on the RTNL lock.

As a result, large amounts of retransmissions were observed in a short
period of time, eventually leading to ETIMEOUT. This was specifically
seen with HNV devices, likely because of even more RTNL dependencies.

Therefore, ensure the return code of ibmvnic_tx_scrq_flush() is
propagated to the xmit function to allow for an earlier (and lock-less)
response to a transport event.

Signed-off-by: Nick Child <nnac123@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Simon Horman April 14, 2024, 10:23 a.m. UTC | #1
On Thu, Apr 11, 2024 at 03:34:35PM -0500, Nick Child wrote:
> In ibmvnic_xmit() if ibmvnic_tx_scrq_flush() returns H_CLOSED then
> it will inform upper level networking functions to disable tx
> queues. H_CLOSED signals that the connection with the vnic server is
> down and a transport event is expected to recover the device.
> 
> Previously, ibmvnic_tx_scrq_flush() was hard-coded to return success.
> Therefore, the queues would remain active until ibmvnic_cleanup() is
> called within do_reset().
> 
> The problem is that do_reset() depends on the RTNL lock. If several
> ibmvnic devices are resetting then there can be a long wait time until
> the last device can grab the lock. During this time the tx/rx queues
> still appear active to upper level functions.
> 
> FYI, we do make a call to netif_carrier_off() outside the RTNL lock but
> its calls to dev_deactivate() are also dependent on the RTNL lock.
> 
> As a result, large amounts of retransmissions were observed in a short
> period of time, eventually leading to ETIMEOUT. This was specifically
> seen with HNV devices, likely because of even more RTNL dependencies.
> 
> Therefore, ensure the return code of ibmvnic_tx_scrq_flush() is
> propagated to the xmit function to allow for an earlier (and lock-less)
> response to a transport event.
> 
> Signed-off-by: Nick Child <nnac123@linux.ibm.com>
> ---
>  drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index 30c47b8470ad..f5177f370354 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -2371,7 +2371,7 @@ static int ibmvnic_tx_scrq_flush(struct ibmvnic_adapter *adapter,
>  		ibmvnic_tx_scrq_clean_buffer(adapter, tx_scrq);
>  	else
>  		ind_bufp->index = 0;
> -	return 0;
> +	return rc;
>  }
>  
>  static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)

Hi Nick,

I notice that some, but not all, cases the return value of
ibmvnic_tx_scrq_flush() is not checked. Should that also be
addressed?
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 30c47b8470ad..f5177f370354 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2371,7 +2371,7 @@  static int ibmvnic_tx_scrq_flush(struct ibmvnic_adapter *adapter,
 		ibmvnic_tx_scrq_clean_buffer(adapter, tx_scrq);
 	else
 		ind_bufp->index = 0;
-	return 0;
+	return rc;
 }
 
 static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)