diff mbox series

[net,2/7] Revert "ibmvnic: remove duplicate napi_schedule call in open function"

Message ID 20210624041316.567622-3-sukadev@linux.ibm.com (mailing list archive)
State Accepted
Commit 2ca220f92878470c6ba03f9946e412323093cc94
Delegated to: Netdev Maintainers
Headers show
Series ibmvnic: Assorted bug fixes | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers fail 2 blamed authors not CCed: davem@davemloft.net lijunp213@gmail.com; 8 maintainers not CCed: tlfalcon@linux.ibm.com paulus@samba.org benh@kernel.crashing.org linuxppc-dev@lists.ozlabs.org mpe@ellerman.id.au davem@davemloft.net lijunp213@gmail.com kuba@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Sukadev Bhattiprolu June 24, 2021, 4:13 a.m. UTC
From: Dany Madden <drt@linux.ibm.com>

This reverts commit 7c451f3ef676c805a4b77a743a01a5c21a250a73.

When a vnic interface is taken down and then up, connectivity is not
restored. We bisected it to this commit. Reverting this commit until
we can fully investigate the issue/benefit of the change.

Fixes: 7c451f3ef676 ("ibmvnic: remove duplicate napi_schedule call in open function")
Reported-by: Cristobal Forno <cforno12@linux.ibm.com>
Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Lijun Pan June 24, 2021, 6:20 a.m. UTC | #1
On Wed, Jun 23, 2021 at 11:16 PM Sukadev Bhattiprolu
<sukadev@linux.ibm.com> wrote:
>
> From: Dany Madden <drt@linux.ibm.com>
>
> This reverts commit 7c451f3ef676c805a4b77a743a01a5c21a250a73.
>
> When a vnic interface is taken down and then up, connectivity is not
> restored. We bisected it to this commit. Reverting this commit until
> we can fully investigate the issue/benefit of the change.
>

The reverted patch shouldn't be the real cause of the problem.
It is very likely VIOS does not forward the rx packets so that the rx interrupt
isn't raised.

> Fixes: 7c451f3ef676 ("ibmvnic: remove duplicate napi_schedule call in open function")
> Reported-by: Cristobal Forno <cforno12@linux.ibm.com>
> Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
> Signed-off-by: Dany Madden <drt@linux.ibm.com>
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
> ---
>  drivers/net/ethernet/ibm/ibmvnic.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index f13ad6bc67cd..fe1627ea9762 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -1234,6 +1234,11 @@ static int __ibmvnic_open(struct net_device *netdev)
>
>         netif_tx_start_all_queues(netdev);
>
> +       if (prev_state == VNIC_CLOSED) {
> +               for (i = 0; i < adapter->req_rx_queues; i++)
> +                       napi_schedule(&adapter->napi[i]);
> +       }
> +

interrupt_rx will schedule the napi, so not necessary here.
Rick Lindsley June 24, 2021, 6:42 a.m. UTC | #2
On 6/23/21 11:20 PM, Lijun Pan wrote:
> On Wed, Jun 23, 2021 at 11:16 PM Sukadev Bhattiprolu

> It is very likely VIOS does not forward the rx packets so that the rx interrupt
> isn't raised.

On what do you base this position? Do you have some concrete data that indicates this?

As noted, bisect led us here.  With the previous patch applied, bug occurs.  With the previous patch removed, it does not.  As the description says, we are reverting the patch until it can be determined whether the problem is the patch itself or that the patch better magnifies some other problem.

Rick
Johaan Smith June 24, 2021, 7:02 a.m. UTC | #3
On Wed, Jun 23, 2021 at 11:17 PM Sukadev Bhattiprolu
<sukadev@linux.ibm.com> wrote:
>
> From: Dany Madden <drt@linux.ibm.com>
>
> This reverts commit 7c451f3ef676c805a4b77a743a01a5c21a250a73.
>
> When a vnic interface is taken down and then up, connectivity is not
> restored. We bisected it to this commit. Reverting this commit until
> we can fully investigate the issue/benefit of the change.
>

Sometimes git bisect may lead us to the false positives. Full investigation is
always the best solution if it is not too hard.
Rick Lindsley June 24, 2021, 7:07 a.m. UTC | #4
On 6/23/21 11:20 PM, Lijun Pan wrote:

>> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
>> index f13ad6bc67cd..fe1627ea9762 100644
>> --- a/drivers/net/ethernet/ibm/ibmvnic.c
>> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
>> @@ -1234,6 +1234,11 @@ static int __ibmvnic_open(struct net_device *netdev)
>>
>>          netif_tx_start_all_queues(netdev);
>>
>> +       if (prev_state == VNIC_CLOSED) {
>> +               for (i = 0; i < adapter->req_rx_queues; i++)
>> +                       napi_schedule(&adapter->napi[i]);
>> +       }
>> +
> 
> interrupt_rx will schedule the napi, so not necessary here.

You keep saying this, but yet there is some case with the original patch that leaves napi unscheduled and the device unresponsive.  Until that is better understood, this patch should be reverted - especially when you consider that calling napi_schedule() when the rx_queue is already scheduled is harmless.  The original patch did not address any specific problem, but did introduce one.

Rick
Rick Lindsley June 24, 2021, 7:28 a.m. UTC | #5
On 6/24/21 12:02 AM, Johaan Smith wrote:
> On Wed, Jun 23, 2021 at 11:17 PM Sukadev Bhattiprolu

> Sometimes git bisect may lead us to the false positives. Full investigation is
> always the best solution if it is not too hard.

I'd be happy to view evidence that points at another patch, if someone has some.
But the original patch did not address any observed problem:

      "So there is no need for do_reset to call napi_schedule again
       at the end of the function though napi_schedule will neglect
       the request if napi is already scheduled."

Given that it fixed no problem but appears to have introduced one, the efficient
action is to revert it for now.
Lijun Pan June 24, 2021, 4:53 p.m. UTC | #6
On Thu, Jun 24, 2021 at 2:05 AM Johaan Smith <johaansmith74@gmail.com> wrote:
>
> On Wed, Jun 23, 2021 at 11:17 PM Sukadev Bhattiprolu
> <sukadev@linux.ibm.com> wrote:
> >
> > From: Dany Madden <drt@linux.ibm.com>
> >
> > This reverts commit 7c451f3ef676c805a4b77a743a01a5c21a250a73.
> >
> > When a vnic interface is taken down and then up, connectivity is not
> > restored. We bisected it to this commit. Reverting this commit until
> > we can fully investigate the issue/benefit of the change.
> >
>
> Sometimes git bisect may lead us to the false positives. Full investigation is
> always the best solution if it is not too hard.

Agreed.
Lijun Pan June 24, 2021, 5:05 p.m. UTC | #7
On Thu, Jun 24, 2021 at 2:29 AM Rick Lindsley
<ricklind@linux.vnet.ibm.com> wrote:
>
> On 6/24/21 12:02 AM, Johaan Smith wrote:
> > On Wed, Jun 23, 2021 at 11:17 PM Sukadev Bhattiprolu
>
> > Sometimes git bisect may lead us to the false positives. Full investigation is
> > always the best solution if it is not too hard.
>
> I'd be happy to view evidence that points at another patch, if someone has some.
> But the original patch did not address any observed problem:
>
>       "So there is no need for do_reset to call napi_schedule again
>        at the end of the function though napi_schedule will neglect
>        the request if napi is already scheduled."
>
> Given that it fixed no problem but appears to have introduced one, the efficient
> action is to revert it for now.
>

With this reverted patch, there are lots of errors after bringing
device down then up, e.g.,
"ibmvnic 30000003 env3: rx buffer returned with rc 6".
That seems something wrong with the handling of set_link_state
DOWN/UP. It is either the communication protocol or the VIOS itself.
Dany Madden June 24, 2021, 6:18 p.m. UTC | #8
On 2021-06-24 10:05, Lijun Pan wrote:
> On Thu, Jun 24, 2021 at 2:29 AM Rick Lindsley
> <ricklind@linux.vnet.ibm.com> wrote:
>> 
>> On 6/24/21 12:02 AM, Johaan Smith wrote:
>> > On Wed, Jun 23, 2021 at 11:17 PM Sukadev Bhattiprolu
>> 
>> > Sometimes git bisect may lead us to the false positives. Full investigation is
>> > always the best solution if it is not too hard.
>> 
>> I'd be happy to view evidence that points at another patch, if someone 
>> has some.
>> But the original patch did not address any observed problem:
>> 
>>       "So there is no need for do_reset to call napi_schedule again
>>        at the end of the function though napi_schedule will neglect
>>        the request if napi is already scheduled."
>> 
>> Given that it fixed no problem but appears to have introduced one, the 
>> efficient
>> action is to revert it for now.
>> 
> 
> With this reverted patch, there are lots of errors after bringing
> device down then up, e.g.,
> "ibmvnic 30000003 env3: rx buffer returned with rc 6".
> That seems something wrong with the handling of set_link_state
> DOWN/UP. It is either the communication protocol or the VIOS itself.

Did the driver bring the link back up with the patch is reverted? When 
link is down, vnic server returns rx buffers back to the client. This 
error message is printed when debug is turned on, driver's way to log 
what happened.
Rick Lindsley June 24, 2021, 11:33 p.m. UTC | #9
On 6/24/21 10:05 AM, Lijun Pan wrote:

> With this reverted patch, there are lots of errors after bringing
> device down then up, e.g.,
> "ibmvnic 30000003 env3: rx buffer returned with rc 6".

Ok, thanks.  Can you provide some more details about your test?  When you
ran this test, did you include the rest of the patches in this series,
only some, or no others at all?  I assume it was based on the contents
of net from 6/23 or 6/24 - is that correct? Can you also describe the
hardware you ran your test on?  And lastly, would you briefly describe
the nature of your test?

The message you quoted is only seen with debug turned on, and as Dany
remarked, can be expected and normal in some recovery situations. It's
always possible you've found a case undetected by our testing, so the
above information can help us better understand what you saw.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index f13ad6bc67cd..fe1627ea9762 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1234,6 +1234,11 @@  static int __ibmvnic_open(struct net_device *netdev)
 
 	netif_tx_start_all_queues(netdev);
 
+	if (prev_state == VNIC_CLOSED) {
+		for (i = 0; i < adapter->req_rx_queues; i++)
+			napi_schedule(&adapter->napi[i]);
+	}
+
 	adapter->state = VNIC_OPEN;
 	return rc;
 }