Message ID | 20230622190332.29223-1-nnac123@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] ibmvnic: Do not reset dql stats on NON_FATAL err | expand |
+ maintainers and blamed authors On Thu, Jun 22, 2023 at 02:03:32PM -0500, Nick Child wrote: > All ibmvnic resets, make a call to netdev_tx_reset_queue() when > re-opening the device. netdev_tx_reset_queue() resets the num_queued > and num_completed byte counters. These stats are used in Byte Queue > Limit (BQL) algorithms. The difference between these two stats tracks > the number of bytes currently sitting on the physical NIC. ibmvnic > increases the number of queued bytes though calls to > netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports > that it is done transmitting bytes, the ibmvnic device increases the > number of completed bytes through calls to netdev_tx_completed_queue(). > It is important to note that the driver batches its transmit calls and > num_queued is increased every time that an skb is added to the next > batch, not necessarily when the batch is sent to VIOS for transmission. > > Unlike other reset types, a NON FATAL reset will not flush the sub crq > tx buffers. Therefore, it is possible for the batched skb array to be > partially full. So if there is call to netdev_tx_reset_queue() when > re-opening the device, the value of num_queued (0) would not account > for the skb's that are currently batched. Eventually, when the batch > is sent to VIOS, the call to netdev_tx_completed_queue() would increase > num_completed to a value greater than the num_queued. This causes a > BUG_ON crash: > > ibmvnic 30000002: Firmware reports error, cause: adapter problem. > Starting recovery... > ibmvnic 30000002: tx error 600 > ibmvnic 30000002: tx error 600 > ibmvnic 30000002: tx error 600 > ibmvnic 30000002: tx error 600 > ------------[ cut here ]------------ > kernel BUG at lib/dynamic_queue_limits.c:27! > Oops: Exception in kernel mode, sig: 5 > [....] > NIP dql_completed+0x28/0x1c0 > LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic] > Call Trace: > ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable) > ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic] > __handle_irq_event_percpu+0x98/0x270 > ---[ end trace ]--- > > Therefore, do not reset the dql stats when performing a NON_FATAL reset. > Simply clearing the queues off bit is sufficient. > > Fixes: 0d973388185d ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls") > Signed-off-by: Nick Child <nnac123@linux.ibm.com> > --- > drivers/net/ethernet/ibm/ibmvnic.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c > index c63d3ec9d328..5523ab52ff2b 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -1816,7 +1816,18 @@ static int __ibmvnic_open(struct net_device *netdev) > if (prev_state == VNIC_CLOSED) > enable_irq(adapter->tx_scrq[i]->irq); > enable_scrq_irq(adapter, adapter->tx_scrq[i]); > - netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i)); > + /* netdev_tx_reset_queue will reset dql stats and set the stacks > + * flag for queue status. During NON_FATAL resets, just > + * clear the bit, don't reset the stats because there could > + * be batched skb's waiting to be sent. If we reset dql stats, > + * we risk num_completed being greater than num_queued. > + * This will cause a BUG_ON in dql_completed(). > + */ > + if (adapter->reset_reason == VNIC_RESET_NON_FATAL) > + clear_bit(__QUEUE_STATE_STACK_XOFF, > + &netdev_get_tx_queue(netdev, i)->state); > + else > + netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i)); > } > > rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP); > -- > 2.31.1 > >
On 6/22/23 12:03, Nick Child wrote: > Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: ricklind@us.ibm.com
On Fri, Jun 23, 2023 at 09:52:26AM +0200, Simon Horman wrote: > + maintainers and blamed authors A second time, because something went wrong with the first attempt. > On Thu, Jun 22, 2023 at 02:03:32PM -0500, Nick Child wrote: > > All ibmvnic resets, make a call to netdev_tx_reset_queue() when > > re-opening the device. netdev_tx_reset_queue() resets the num_queued > > and num_completed byte counters. These stats are used in Byte Queue > > Limit (BQL) algorithms. The difference between these two stats tracks > > the number of bytes currently sitting on the physical NIC. ibmvnic > > increases the number of queued bytes though calls to > > netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports > > that it is done transmitting bytes, the ibmvnic device increases the > > number of completed bytes through calls to netdev_tx_completed_queue(). > > It is important to note that the driver batches its transmit calls and > > num_queued is increased every time that an skb is added to the next > > batch, not necessarily when the batch is sent to VIOS for transmission. > > > > Unlike other reset types, a NON FATAL reset will not flush the sub crq > > tx buffers. Therefore, it is possible for the batched skb array to be > > partially full. So if there is call to netdev_tx_reset_queue() when > > re-opening the device, the value of num_queued (0) would not account > > for the skb's that are currently batched. Eventually, when the batch > > is sent to VIOS, the call to netdev_tx_completed_queue() would increase > > num_completed to a value greater than the num_queued. This causes a > > BUG_ON crash: > > > > ibmvnic 30000002: Firmware reports error, cause: adapter problem. > > Starting recovery... > > ibmvnic 30000002: tx error 600 > > ibmvnic 30000002: tx error 600 > > ibmvnic 30000002: tx error 600 > > ibmvnic 30000002: tx error 600 > > ------------[ cut here ]------------ > > kernel BUG at lib/dynamic_queue_limits.c:27! > > Oops: Exception in kernel mode, sig: 5 > > [....] > > NIP dql_completed+0x28/0x1c0 > > LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic] > > Call Trace: > > ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable) > > ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic] > > __handle_irq_event_percpu+0x98/0x270 > > ---[ end trace ]--- > > > > Therefore, do not reset the dql stats when performing a NON_FATAL reset. > > Simply clearing the queues off bit is sufficient. > > > > Fixes: 0d973388185d ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls") > > Signed-off-by: Nick Child <nnac123@linux.ibm.com> > > --- > > drivers/net/ethernet/ibm/ibmvnic.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c > > index c63d3ec9d328..5523ab52ff2b 100644 > > --- a/drivers/net/ethernet/ibm/ibmvnic.c > > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > > @@ -1816,7 +1816,18 @@ static int __ibmvnic_open(struct net_device *netdev) > > if (prev_state == VNIC_CLOSED) > > enable_irq(adapter->tx_scrq[i]->irq); > > enable_scrq_irq(adapter, adapter->tx_scrq[i]); > > - netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i)); > > + /* netdev_tx_reset_queue will reset dql stats and set the stacks > > + * flag for queue status. During NON_FATAL resets, just > > + * clear the bit, don't reset the stats because there could > > + * be batched skb's waiting to be sent. If we reset dql stats, > > + * we risk num_completed being greater than num_queued. > > + * This will cause a BUG_ON in dql_completed(). > > + */ > > + if (adapter->reset_reason == VNIC_RESET_NON_FATAL) > > + clear_bit(__QUEUE_STATE_STACK_XOFF, > > + &netdev_get_tx_queue(netdev, i)->state); > > + else > > + netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i)); > > } > > > > rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP); > > -- > > 2.31.1 > > > >
On Thu, 22 Jun 2023 14:03:32 -0500 Nick Child wrote: > + if (adapter->reset_reason == VNIC_RESET_NON_FATAL) > + clear_bit(__QUEUE_STATE_STACK_XOFF, > + &netdev_get_tx_queue(netdev, i)->state); Why are you trying to clear this bit? If the completions will still come the bit will be cleared (or not) during completion handling (netdev_tx_completed_queue() et al.) Drivers shouldn't be poking into queue state bits directly.
On 6/24/23 17:19, Jakub Kicinski wrote: > On Thu, 22 Jun 2023 14:03:32 -0500 Nick Child wrote: >> + if (adapter->reset_reason == VNIC_RESET_NON_FATAL) >> + clear_bit(__QUEUE_STATE_STACK_XOFF, >> + &netdev_get_tx_queue(netdev, i)->state); > > Why are you trying to clear this bit? > > If the completions will still come the bit will be cleared (or not) > during completion handling (netdev_tx_completed_queue() et al.) > > Drivers shouldn't be poking into queue state bits directly. Most likely, yes there could be some bytes that will get a completion which would clear this bit. That being said, it is also possible that all bytes sent to the NIC are already completed. In which case we would not get a completion. The ibmvnic driver sends its skb's to the NIC in batches, it makes a call to netdev_tx_sent_queue on every time an skb is added to the batch. This is not necessarily every-time that the batch is sent to the NIC. I am not sure what number of bytes causes dql to set __QUEUE_STATE_STACK_XOFF but I do know that it is possible for up to 15 skb's to be sitting in the queues batch. If there are no outstanding bytes on the NIC (ie not expecting a tx completion) and the internal batch has 15 references per queue, is this enough to enforce dql and set __QUEUE_STATE_STACK_XOFF? If so, then we must clear __QUEUE_STATE_STACK_XOFF when resetting. I had a feeling this would raise some eyebrows. The main intent is to do everything that netdev_tx_reset_queue() does besides resetting statistics. Setting a "*STACK*" flag in driver code feels wrong (especially since a *DRV* flag exists) but I could not find an appropriate upper-level function. I suppose an alternative is to read this flag after the device finishes the reset and sending the batch if it is set. Is this any better? As always, thanks for the review. Nick
On Mon, 26 Jun 2023 10:45:38 -0500 Nick Child wrote: > On 6/24/23 17:19, Jakub Kicinski wrote: > > On Thu, 22 Jun 2023 14:03:32 -0500 Nick Child wrote: > [...] > > > > Why are you trying to clear this bit? > > > > If the completions will still come the bit will be cleared (or not) > > during completion handling (netdev_tx_completed_queue() et al.) > > > > Drivers shouldn't be poking into queue state bits directly. > > Most likely, yes there could be some bytes that will get a completion > which would clear this bit. > > That being said, it is also possible that all bytes sent to the NIC are > already completed. In which case we would not get a completion. The > ibmvnic driver sends its skb's to the NIC in batches, it makes a call to > netdev_tx_sent_queue on every time an skb is added to the batch. This is > not necessarily every-time that the batch is sent to the NIC. If packets can get stuck in the driver that needs a dedicated fix. > I am not sure what number of bytes causes dql to set > __QUEUE_STATE_STACK_XOFF but I do know that it is possible for up to 15 > skb's to be sitting in the queues batch. If there are no outstanding > bytes on the NIC (ie not expecting a tx completion) and the internal > batch has 15 references per queue, is this enough to enforce dql and set > __QUEUE_STATE_STACK_XOFF? If so, then we must clear > __QUEUE_STATE_STACK_XOFF when resetting. You should only be doing the batching if xmit_more is set, really. And xmit_more will not be set if core is about to set __QUEUE_STATE_STACK_XOFF. With a correctly written driver STACK_XOFF can only be set if we're expecting a completion. > I had a feeling this would raise some eyebrows. The main intent is to do > everything that netdev_tx_reset_queue() does besides resetting > statistics. Setting a "*STACK*" flag in driver code feels wrong > (especially since a *DRV* flag exists) but I could not find an > appropriate upper-level function. I suppose an alternative is to read > this flag after the device finishes the reset and sending the batch if > it is set. Is this any better? AFAIU you shouldn't have to clear this flag. We need to reset DQL when hard resetting the queue because we may lose completions. But if no completions are lost, STACK_XOFF should be left alone. Just clearing that flag is not valid either because qdiscs will not be rescheduled, so until some socket tries to xmit next packet the data already queued will not move.
On 6/24/23 15:19, Jakub Kicinski wrote: > Why are you trying to clear this bit? > > If the completions will still come the bit will be cleared (or not) > during completion handling (netdev_tx_completed_queue() et al.) > > Drivers shouldn't be poking into queue state bits directly. + if (adapter->reset_reason == VNIC_RESET_NON_FATAL) + clear_bit(__QUEUE_STATE_STACK_XOFF, + &netdev_get_tx_queue(netdev, i)->state); + else + netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i)); The problem is the call to dql_reset() in netdev_tx_reset_queue(). If we reset dql stats, we risk num_completed being greater than num_queued, which will cause a BUG_ON to fire in dql_completed(). static inline void netdev_tx_reset_queue(struct netdev_queue *q) { #ifdef CONFIG_BQL clear_bit(__QUEUE_STATE_STACK_XOFF, &q->state); dql_reset(&q->dql); #endif } Existing code calls netdev_tx_reset_queue() unconditionally. When the error is non-fatal, though, we were tripping over the BUG_ON on those occasions when a few skbs were already submitted. The patch here is more about NOT calling dql_reset() than it is about clearing __QUEUE_STATE_STACK_XOFF. That was only included because it-was-the-other-thing-the-function-did. So ... maybe we don't care about __QUEUE_STATE_STACK_XOFF? Nick, do we *need* to have __QUEUE_STATE_STACK_XOFF cleared? If not, then do we need to call or emulate netdev_tx_reset_queue() at all on a non-fatal error?
On 6/26/23 12:33, Jakub Kicinski wrote: > > If packets can get stuck in the driver that needs a dedicated fix. > > You should only be doing the batching if xmit_more is set, really. > And xmit_more will not be set if core is about to set > __QUEUE_STATE_STACK_XOFF. > I am not saying that packets can get stuck in the driver, if xmit_more is false or if the driver is hard resetting then the batch gets sent. During nonfatal reset, we are simply reestablishing communications with the NIC and it is okay to keep the batch around. It is possible that there are batched skb's because xmit_more and the non fatal reset work independently of each-other and are not mutually exclusive. The upper level functions have no way of knowing when an issue from the NIC will occur. > With a correctly written driver STACK_XOFF can only be set if we're > expecting a completion. > > AFAIU you shouldn't have to clear this flag. We need to reset DQL when > hard resetting the queue because we may lose completions. But if no > completions are lost, STACK_XOFF should be left alone. > > Just clearing that flag is not valid either because qdiscs will not be > rescheduled, so until some socket tries to xmit next packet the data > already queued will not move. This addresses my concern. So if STACK_XOFF gets set, then xmit_more will be false and our batch will get sent. This will eventually lead to a completion which will clear STACK_XOFF. Meaning the reset should not have to worry about clearing STACK_XOFF. My worry was that STACK_XOFF would get set when the batch was only partially full and xmit_more was true. If a non fatal reset occurred here then there would be no expected completions and a partially filled batch. So I thought we would have to clear STACK_XOFF in order to get the queue to be usable again. Sounds like this is not possible due to xmit_more being false if STACK_XOFF gets set. On 6/26/23 14:06, Rick Lindsley wrote: > > Nick, do we *need* to have __QUEUE_STATE_STACK_XOFF cleared? If not, > then do we need to call or emulate netdev_tx_reset_queue() at all on a > non-fatal error? After reading Jakubs review. I believe we should remove the part where we clear STACK_XOFF. If there are no objections, I will test these changes and send a v2. Thanks
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index c63d3ec9d328..5523ab52ff2b 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1816,7 +1816,18 @@ static int __ibmvnic_open(struct net_device *netdev) if (prev_state == VNIC_CLOSED) enable_irq(adapter->tx_scrq[i]->irq); enable_scrq_irq(adapter, adapter->tx_scrq[i]); - netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i)); + /* netdev_tx_reset_queue will reset dql stats and set the stacks + * flag for queue status. During NON_FATAL resets, just + * clear the bit, don't reset the stats because there could + * be batched skb's waiting to be sent. If we reset dql stats, + * we risk num_completed being greater than num_queued. + * This will cause a BUG_ON in dql_completed(). + */ + if (adapter->reset_reason == VNIC_RESET_NON_FATAL) + clear_bit(__QUEUE_STATE_STACK_XOFF, + &netdev_get_tx_queue(netdev, i)->state); + else + netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i)); } rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP);
All ibmvnic resets, make a call to netdev_tx_reset_queue() when re-opening the device. netdev_tx_reset_queue() resets the num_queued and num_completed byte counters. These stats are used in Byte Queue Limit (BQL) algorithms. The difference between these two stats tracks the number of bytes currently sitting on the physical NIC. ibmvnic increases the number of queued bytes though calls to netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports that it is done transmitting bytes, the ibmvnic device increases the number of completed bytes through calls to netdev_tx_completed_queue(). It is important to note that the driver batches its transmit calls and num_queued is increased every time that an skb is added to the next batch, not necessarily when the batch is sent to VIOS for transmission. Unlike other reset types, a NON FATAL reset will not flush the sub crq tx buffers. Therefore, it is possible for the batched skb array to be partially full. So if there is call to netdev_tx_reset_queue() when re-opening the device, the value of num_queued (0) would not account for the skb's that are currently batched. Eventually, when the batch is sent to VIOS, the call to netdev_tx_completed_queue() would increase num_completed to a value greater than the num_queued. This causes a BUG_ON crash: ibmvnic 30000002: Firmware reports error, cause: adapter problem. Starting recovery... ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ------------[ cut here ]------------ kernel BUG at lib/dynamic_queue_limits.c:27! Oops: Exception in kernel mode, sig: 5 [....] NIP dql_completed+0x28/0x1c0 LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic] Call Trace: ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable) ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic] __handle_irq_event_percpu+0x98/0x270 ---[ end trace ]--- Therefore, do not reset the dql stats when performing a NON_FATAL reset. Simply clearing the queues off bit is sufficient. Fixes: 0d973388185d ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls") Signed-off-by: Nick Child <nnac123@linux.ibm.com> --- drivers/net/ethernet/ibm/ibmvnic.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)