Message ID | 20240416164128.387920-1-nnac123@linux.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 5cb431dcf8048572e9ffc6c30cdbd8832cbe502d |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2,net-next] ibmvnic: Return error code on TX scrq flush fail | expand |
Hello: This patch was applied to netdev/net-next.git (main) by Paolo Abeni <pabeni@redhat.com>: On Tue, 16 Apr 2024 11:41:28 -0500 you wrote: > In ibmvnic_xmit() if ibmvnic_tx_scrq_flush() returns H_CLOSED then > it will inform upper level networking functions to disable tx > queues. H_CLOSED signals that the connection with the vnic server is > down and a transport event is expected to recover the device. > > Previously, ibmvnic_tx_scrq_flush() was hard-coded to return success. > Therefore, the queues would remain active until ibmvnic_cleanup() is > called within do_reset(). > > [...] Here is the summary with links: - [v2,net-next] ibmvnic: Return error code on TX scrq flush fail https://git.kernel.org/netdev/net-next/c/5cb431dcf804 You are awesome, thank you!
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 30c47b8470ad..5e9a93bdb518 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2371,7 +2371,7 @@ static int ibmvnic_tx_scrq_flush(struct ibmvnic_adapter *adapter, ibmvnic_tx_scrq_clean_buffer(adapter, tx_scrq); else ind_bufp->index = 0; - return 0; + return rc; } static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) @@ -2424,7 +2424,9 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) tx_dropped++; tx_send_failed++; ret = NETDEV_TX_OK; - ibmvnic_tx_scrq_flush(adapter, tx_scrq); + lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq); + if (lpar_rc != H_SUCCESS) + goto tx_err; goto out; } @@ -2439,8 +2441,10 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) dev_kfree_skb_any(skb); tx_send_failed++; tx_dropped++; - ibmvnic_tx_scrq_flush(adapter, tx_scrq); ret = NETDEV_TX_OK; + lpar_rc = ibmvnic_tx_scrq_flush(adapter, tx_scrq); + if (lpar_rc != H_SUCCESS) + goto tx_err; goto out; }
In ibmvnic_xmit() if ibmvnic_tx_scrq_flush() returns H_CLOSED then it will inform upper level networking functions to disable tx queues. H_CLOSED signals that the connection with the vnic server is down and a transport event is expected to recover the device. Previously, ibmvnic_tx_scrq_flush() was hard-coded to return success. Therefore, the queues would remain active until ibmvnic_cleanup() is called within do_reset(). The problem is that do_reset() depends on the RTNL lock. If several ibmvnic devices are resetting then there can be a long wait time until the last device can grab the lock. During this time the tx/rx queues still appear active to upper level functions. FYI, we do make a call to netif_carrier_off() outside the RTNL lock but its calls to dev_deactivate() are also dependent on the RTNL lock. As a result, large amounts of retransmissions were observed in a short period of time, eventually leading to ETIMEOUT. This was specifically seen with HNV devices, likely because of even more RTNL dependencies. Therefore, ensure the return code of ibmvnic_tx_scrq_flush() is propagated to the xmit function to allow for an earlier (and lock-less) response to a transport event. Signed-off-by: Nick Child <nnac123@linux.ibm.com> --- v1 - https://lore.kernel.org/netdev/20240414102337.GA645060@kernel.org/ Changes: - Edit based on Simon's review (thanks!), all callers of ibmvnic_tx_scrq_flush should respoind to the return code drivers/net/ethernet/ibm/ibmvnic.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)