Message ID | 20210406034752.12840-1-drt@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ibmvnic: Continue with reset if set link down failed | expand |
On 4/5/21 8:47 PM, Dany Madden wrote: > When an adapter is going thru a reset, it maybe in an unstable state that > makes a request to set link down fail. In such a case, the adapter needs > to continue on with reset to bring itself back to a stable state. > > Fixes: ed651a10875f ("ibmvnic: Updated reset handling") > Signed-off-by: Dany Madden <drt@linux.ibm.com> Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
> On Apr 5, 2021, at 10:47 PM, Dany Madden <drt@linux.ibm.com> wrote: > > When an adapter is going thru a reset, it maybe in an unstable state that > makes a request to set link down fail. In such a case, the adapter needs > to continue on with reset to bring itself back to a stable state. > > Fixes: ed651a10875f ("ibmvnic: Updated reset handling") > Signed-off-by: Dany Madden <drt@linux.ibm.com> > --- > drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c > index 9c6438d3b3a5..e4f01a7099a0 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -1976,8 +1976,10 @@ static int do_reset(struct ibmvnic_adapter *adapter, > rtnl_unlock(); > rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN); > rtnl_lock(); > - if (rc) > - goto out; > + if (rc) { > + netdev_dbg(netdev, > + "Setting link down failed rc=%d. Continue anyway\n", rc); > + } What’s the point of checking the return code if it can be neglected anyway? If we really don’t care if set_link_state succeeds or not, we don’t even need to call set_link_state() here. It seems more correct to me that we find out why set_link_state fails and fix it from that end. Lijun > > if (adapter->state == VNIC_OPEN) { > /* When we dropped rtnl, ibmvnic_open() got > -- > 2.26.2 >
On 2021-04-05 23:46, Lijun Pan wrote: >> On Apr 5, 2021, at 10:47 PM, Dany Madden <drt@linux.ibm.com> wrote: >> >> When an adapter is going thru a reset, it maybe in an unstable state >> that >> makes a request to set link down fail. In such a case, the adapter >> needs >> to continue on with reset to bring itself back to a stable state. >> >> Fixes: ed651a10875f ("ibmvnic: Updated reset handling") >> Signed-off-by: Dany Madden <drt@linux.ibm.com> >> --- >> drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c >> b/drivers/net/ethernet/ibm/ibmvnic.c >> index 9c6438d3b3a5..e4f01a7099a0 100644 >> --- a/drivers/net/ethernet/ibm/ibmvnic.c >> +++ b/drivers/net/ethernet/ibm/ibmvnic.c >> @@ -1976,8 +1976,10 @@ static int do_reset(struct ibmvnic_adapter >> *adapter, >> rtnl_unlock(); >> rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN); >> rtnl_lock(); >> - if (rc) >> - goto out; >> + if (rc) { >> + netdev_dbg(netdev, >> + "Setting link down failed rc=%d. Continue anyway\n", rc); >> + } > > What’s the point of checking the return code if it can be neglected > anyway? > If we really don’t care if set_link_state succeeds or not, we don’t > even need to call > set_link_state() here. > It seems more correct to me that we find out why set_link_state fails > and fix it from that end. We know why set link state failed. CRQ is no longer active at this point. It is not possible to send a link down request to the VIOS. If driver exits here, adapter will be left in an inoperable state. If it continues to reinitialize the crq, it can continue to reset and come up. Prior to submitting this patch, we ran a 17-hour and a 24-hour tests (LPM+failover) on 10 vnics. We saw that: 17 hours, hit 4 times - 3 times driver is able to continue on to re-init CRQ and continue on to bring the adapter up. - 1 time driver failed to re-init CRQ due to the last reset failed and released the CRQ. Subsequent hard reset from a transport event (failover) succeeded. 24 hours, hit 10 times - 7 times driver is able to continue on to re-init CRQ and continue to bring the adapter up. - 3 times driver failed to init CRQ due to the last reset failed and released the CRQ. Subsequent hard reset from a transport event (failover or lpm) succeed. In both runs, with the patch, 10 vnics continue to work as expected. > > Lijun > >> >> if (adapter->state == VNIC_OPEN) { >> /* When we dropped rtnl, ibmvnic_open() got >> -- >> 2.26.2 >>
Dany Madden [drt@linux.ibm.com] wrote: > When an adapter is going thru a reset, it maybe in an unstable state that > makes a request to set link down fail. In such a case, the adapter needs > to continue on with reset to bring itself back to a stable state. > > Fixes: ed651a10875f ("ibmvnic: Updated reset handling") > Signed-off-by: Dany Madden <drt@linux.ibm.com> Given that the likely reason for set_link_state() failing is that the CRQ is inactive and that we will attempt to free the CRQ and re-register it in ibmvnic_reset_crq() further down, I think its okay to ignore the error here. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
On 2021-04-05 23:46, Lijun Pan wrote: >> On Apr 5, 2021, at 10:47 PM, Dany Madden <drt@linux.ibm.com> wrote: >> >> When an adapter is going thru a reset, it maybe in an unstable state >> that >> makes a request to set link down fail. In such a case, the adapter >> needs >> to continue on with reset to bring itself back to a stable state. >> >> Fixes: ed651a10875f ("ibmvnic: Updated reset handling") >> Signed-off-by: Dany Madden <drt@linux.ibm.com> >> --- >> drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c >> b/drivers/net/ethernet/ibm/ibmvnic.c >> index 9c6438d3b3a5..e4f01a7099a0 100644 >> --- a/drivers/net/ethernet/ibm/ibmvnic.c >> +++ b/drivers/net/ethernet/ibm/ibmvnic.c >> @@ -1976,8 +1976,10 @@ static int do_reset(struct ibmvnic_adapter >> *adapter, >> rtnl_unlock(); >> rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN); >> rtnl_lock(); >> - if (rc) >> - goto out; >> + if (rc) { >> + netdev_dbg(netdev, >> + "Setting link down failed rc=%d. Continue anyway\n", rc); >> + } > > What’s the point of checking the return code if it can be neglected > anyway? > If we really don’t care if set_link_state succeeds or not, we don’t > even need to call > set_link_state() here. > It seems more correct to me that we find out why set_link_state fails > and fix it from that end. > > Lijun > >> >> if (adapter->state == VNIC_OPEN) { >> /* When we dropped rtnl, ibmvnic_open() got >> -- >> 2.26.2 >>
On 2021-04-07 12:03, Dany Madden wrote: > On 2021-04-05 23:46, Lijun Pan wrote: >>> On Apr 5, 2021, at 10:47 PM, Dany Madden <drt@linux.ibm.com> wrote: >>> >>> When an adapter is going thru a reset, it maybe in an unstable state >>> that >>> makes a request to set link down fail. In such a case, the adapter >>> needs >>> to continue on with reset to bring itself back to a stable state. >>> >>> Fixes: ed651a10875f ("ibmvnic: Updated reset handling") >>> Signed-off-by: Dany Madden <drt@linux.ibm.com> >>> --- >>> drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++-- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c >>> b/drivers/net/ethernet/ibm/ibmvnic.c >>> index 9c6438d3b3a5..e4f01a7099a0 100644 >>> --- a/drivers/net/ethernet/ibm/ibmvnic.c >>> +++ b/drivers/net/ethernet/ibm/ibmvnic.c >>> @@ -1976,8 +1976,10 @@ static int do_reset(struct ibmvnic_adapter >>> *adapter, >>> rtnl_unlock(); >>> rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN); >>> rtnl_lock(); >>> - if (rc) >>> - goto out; >>> + if (rc) { >>> + netdev_dbg(netdev, >>> + "Setting link down failed rc=%d. Continue anyway\n", rc); >>> + } >> >> What’s the point of checking the return code if it can be neglected >> anyway? >> If we really don’t care if set_link_state succeeds or not, we don’t >> even need to call >> set_link_state() here. >> It seems more correct to me that we find out why set_link_state fails >> and fix it from that end. > > We know why set link state failed. CRQ is no longer active at this > point. It is not possible to send a link down request to the VIOS. If > driver exits here, adapter will be left in an inoperable state. If it > continues to reinitialize the crq, it can continue to reset and come > up. > > Prior to submitting this patch, we ran a 17-hour and a 24-hour tests > (LPM+failover) on 10 vnics. We saw that: > > 17 hours, hit 4 times > - 3 times driver is able to continue on to re-init CRQ and continue on > to bring the adapter up. > - 1 time driver failed to re-init CRQ due to the last reset failed and > released the CRQ. Subsequent hard reset from a transport event > (failover) succeeded. > > 24 hours, hit 10 times > - 7 times driver is able to continue on to re-init CRQ and continue to > bring the adapter up. > - 3 times driver failed to init CRQ due to the last reset failed and > released the CRQ. Subsequent hard reset from a transport event > (failover or lpm) succeed. > > In both runs, with the patch, 10 vnics continue to work as expected. Is there anything else that we need to address before this is accepted? Dany > >> >> Lijun >> >>> >>> if (adapter->state == VNIC_OPEN) { >>> /* When we dropped rtnl, ibmvnic_open() got >>> -- >>> 2.26.2 >>>
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 9c6438d3b3a5..e4f01a7099a0 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1976,8 +1976,10 @@ static int do_reset(struct ibmvnic_adapter *adapter, rtnl_unlock(); rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN); rtnl_lock(); - if (rc) - goto out; + if (rc) { + netdev_dbg(netdev, + "Setting link down failed rc=%d. Continue anyway\n", rc); + } if (adapter->state == VNIC_OPEN) { /* When we dropped rtnl, ibmvnic_open() got
When an adapter is going thru a reset, it maybe in an unstable state that makes a request to set link down fail. In such a case, the adapter needs to continue on with reset to bring itself back to a stable state. Fixes: ed651a10875f ("ibmvnic: Updated reset handling") Signed-off-by: Dany Madden <drt@linux.ibm.com> --- drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)