Message ID | 20210624041316.567622-3-sukadev@linux.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 2ca220f92878470c6ba03f9946e412323093cc94 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ibmvnic: Assorted bug fixes | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | fail | 2 blamed authors not CCed: davem@davemloft.net lijunp213@gmail.com; 8 maintainers not CCed: tlfalcon@linux.ibm.com paulus@samba.org benh@kernel.crashing.org linuxppc-dev@lists.ozlabs.org mpe@ellerman.id.au davem@davemloft.net lijunp213@gmail.com kuba@kernel.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 3 this patch: 3 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 11 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
On Wed, Jun 23, 2021 at 11:16 PM Sukadev Bhattiprolu <sukadev@linux.ibm.com> wrote: > > From: Dany Madden <drt@linux.ibm.com> > > This reverts commit 7c451f3ef676c805a4b77a743a01a5c21a250a73. > > When a vnic interface is taken down and then up, connectivity is not > restored. We bisected it to this commit. Reverting this commit until > we can fully investigate the issue/benefit of the change. > The reverted patch shouldn't be the real cause of the problem. It is very likely VIOS does not forward the rx packets so that the rx interrupt isn't raised. > Fixes: 7c451f3ef676 ("ibmvnic: remove duplicate napi_schedule call in open function") > Reported-by: Cristobal Forno <cforno12@linux.ibm.com> > Reported-by: Abdul Haleem <abdhalee@in.ibm.com> > Signed-off-by: Dany Madden <drt@linux.ibm.com> > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> > --- > drivers/net/ethernet/ibm/ibmvnic.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c > index f13ad6bc67cd..fe1627ea9762 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -1234,6 +1234,11 @@ static int __ibmvnic_open(struct net_device *netdev) > > netif_tx_start_all_queues(netdev); > > + if (prev_state == VNIC_CLOSED) { > + for (i = 0; i < adapter->req_rx_queues; i++) > + napi_schedule(&adapter->napi[i]); > + } > + interrupt_rx will schedule the napi, so not necessary here.
On 6/23/21 11:20 PM, Lijun Pan wrote: > On Wed, Jun 23, 2021 at 11:16 PM Sukadev Bhattiprolu > It is very likely VIOS does not forward the rx packets so that the rx interrupt > isn't raised. On what do you base this position? Do you have some concrete data that indicates this? As noted, bisect led us here. With the previous patch applied, bug occurs. With the previous patch removed, it does not. As the description says, we are reverting the patch until it can be determined whether the problem is the patch itself or that the patch better magnifies some other problem. Rick
On Wed, Jun 23, 2021 at 11:17 PM Sukadev Bhattiprolu <sukadev@linux.ibm.com> wrote: > > From: Dany Madden <drt@linux.ibm.com> > > This reverts commit 7c451f3ef676c805a4b77a743a01a5c21a250a73. > > When a vnic interface is taken down and then up, connectivity is not > restored. We bisected it to this commit. Reverting this commit until > we can fully investigate the issue/benefit of the change. > Sometimes git bisect may lead us to the false positives. Full investigation is always the best solution if it is not too hard.
On 6/23/21 11:20 PM, Lijun Pan wrote: >> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c >> index f13ad6bc67cd..fe1627ea9762 100644 >> --- a/drivers/net/ethernet/ibm/ibmvnic.c >> +++ b/drivers/net/ethernet/ibm/ibmvnic.c >> @@ -1234,6 +1234,11 @@ static int __ibmvnic_open(struct net_device *netdev) >> >> netif_tx_start_all_queues(netdev); >> >> + if (prev_state == VNIC_CLOSED) { >> + for (i = 0; i < adapter->req_rx_queues; i++) >> + napi_schedule(&adapter->napi[i]); >> + } >> + > > interrupt_rx will schedule the napi, so not necessary here. You keep saying this, but yet there is some case with the original patch that leaves napi unscheduled and the device unresponsive. Until that is better understood, this patch should be reverted - especially when you consider that calling napi_schedule() when the rx_queue is already scheduled is harmless. The original patch did not address any specific problem, but did introduce one. Rick
On 6/24/21 12:02 AM, Johaan Smith wrote: > On Wed, Jun 23, 2021 at 11:17 PM Sukadev Bhattiprolu > Sometimes git bisect may lead us to the false positives. Full investigation is > always the best solution if it is not too hard. I'd be happy to view evidence that points at another patch, if someone has some. But the original patch did not address any observed problem: "So there is no need for do_reset to call napi_schedule again at the end of the function though napi_schedule will neglect the request if napi is already scheduled." Given that it fixed no problem but appears to have introduced one, the efficient action is to revert it for now.
On Thu, Jun 24, 2021 at 2:05 AM Johaan Smith <johaansmith74@gmail.com> wrote: > > On Wed, Jun 23, 2021 at 11:17 PM Sukadev Bhattiprolu > <sukadev@linux.ibm.com> wrote: > > > > From: Dany Madden <drt@linux.ibm.com> > > > > This reverts commit 7c451f3ef676c805a4b77a743a01a5c21a250a73. > > > > When a vnic interface is taken down and then up, connectivity is not > > restored. We bisected it to this commit. Reverting this commit until > > we can fully investigate the issue/benefit of the change. > > > > Sometimes git bisect may lead us to the false positives. Full investigation is > always the best solution if it is not too hard. Agreed.
On Thu, Jun 24, 2021 at 2:29 AM Rick Lindsley <ricklind@linux.vnet.ibm.com> wrote: > > On 6/24/21 12:02 AM, Johaan Smith wrote: > > On Wed, Jun 23, 2021 at 11:17 PM Sukadev Bhattiprolu > > > Sometimes git bisect may lead us to the false positives. Full investigation is > > always the best solution if it is not too hard. > > I'd be happy to view evidence that points at another patch, if someone has some. > But the original patch did not address any observed problem: > > "So there is no need for do_reset to call napi_schedule again > at the end of the function though napi_schedule will neglect > the request if napi is already scheduled." > > Given that it fixed no problem but appears to have introduced one, the efficient > action is to revert it for now. > With this reverted patch, there are lots of errors after bringing device down then up, e.g., "ibmvnic 30000003 env3: rx buffer returned with rc 6". That seems something wrong with the handling of set_link_state DOWN/UP. It is either the communication protocol or the VIOS itself.
On 2021-06-24 10:05, Lijun Pan wrote: > On Thu, Jun 24, 2021 at 2:29 AM Rick Lindsley > <ricklind@linux.vnet.ibm.com> wrote: >> >> On 6/24/21 12:02 AM, Johaan Smith wrote: >> > On Wed, Jun 23, 2021 at 11:17 PM Sukadev Bhattiprolu >> >> > Sometimes git bisect may lead us to the false positives. Full investigation is >> > always the best solution if it is not too hard. >> >> I'd be happy to view evidence that points at another patch, if someone >> has some. >> But the original patch did not address any observed problem: >> >> "So there is no need for do_reset to call napi_schedule again >> at the end of the function though napi_schedule will neglect >> the request if napi is already scheduled." >> >> Given that it fixed no problem but appears to have introduced one, the >> efficient >> action is to revert it for now. >> > > With this reverted patch, there are lots of errors after bringing > device down then up, e.g., > "ibmvnic 30000003 env3: rx buffer returned with rc 6". > That seems something wrong with the handling of set_link_state > DOWN/UP. It is either the communication protocol or the VIOS itself. Did the driver bring the link back up with the patch is reverted? When link is down, vnic server returns rx buffers back to the client. This error message is printed when debug is turned on, driver's way to log what happened.
On 6/24/21 10:05 AM, Lijun Pan wrote: > With this reverted patch, there are lots of errors after bringing > device down then up, e.g., > "ibmvnic 30000003 env3: rx buffer returned with rc 6". Ok, thanks. Can you provide some more details about your test? When you ran this test, did you include the rest of the patches in this series, only some, or no others at all? I assume it was based on the contents of net from 6/23 or 6/24 - is that correct? Can you also describe the hardware you ran your test on? And lastly, would you briefly describe the nature of your test? The message you quoted is only seen with debug turned on, and as Dany remarked, can be expected and normal in some recovery situations. It's always possible you've found a case undetected by our testing, so the above information can help us better understand what you saw.
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index f13ad6bc67cd..fe1627ea9762 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1234,6 +1234,11 @@ static int __ibmvnic_open(struct net_device *netdev) netif_tx_start_all_queues(netdev); + if (prev_state == VNIC_CLOSED) { + for (i = 0; i < adapter->req_rx_queues; i++) + napi_schedule(&adapter->napi[i]); + } + adapter->state = VNIC_OPEN; return rc; }