Message ID | 20210211014144.881861-2-sukadev@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [1/1] ibmvnic: fix a race between open and reset | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
On Wed, Feb 10, 2021 at 7:44 PM Sukadev Bhattiprolu <sukadev@linux.ibm.com> wrote: > > __ibmvnic_reset() currently reads the adapter->state before getting the > rtnl and saves that state as the "target state" for the reset. If this > read occurs when adapter is in PROBED state, the target state would be > PROBED. > > Just after the target state is saved, and before the actual reset process > is started (i.e before rtnl is acquired) if we get an ibmvnic_open() call > we would move the adapter to OPEN state. > > But when the reset is processed (after ibmvnic_open()) drops the rtnl), > it will leave the adapter in PROBED state even though we already moved > it to OPEN. > > To fix this, use the RTNL to improve the serialization when reading/updating > the adapter state. i.e determine the target state of a reset only after > getting the RTNL. And if a reset is in progress during an open, simply > set the target state of the adapter and let the reset code finish the > open (like we currently do if failover is pending). > > One twist to this serialization is if the adapter state changes when we > drop the RTNL to update the link state. Account for this by checking if > there was an intervening open and update the target state for the reset > accordingly (see new comments in the code). Note that only the reset > functions and ibmvnic_open() can set the adapter to OPEN state and this > must happen under rtnl. > > Fixes: 7d7195a026ba ("ibmvnic: Do not process device remove during device reset") > Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> > Reviewed-by: Dany Madden <drt@linux.ibm.com> > --- > Changelog[v3] > [Jakub Kicinski] Rebase to current net and fix comment style. > > Changelog[v2] > [Jakub Kicinski] Use ASSERT_RTNL() instead of WARN_ON_ONCE() > and rtnl_is_locked()); > --- > drivers/net/ethernet/ibm/ibmvnic.c | 71 ++++++++++++++++++++++++++---- > 1 file changed, 63 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c > index a536fdbf05e1..96c2b0985484 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -1197,12 +1197,25 @@ static int ibmvnic_open(struct net_device *netdev) > struct ibmvnic_adapter *adapter = netdev_priv(netdev); > int rc; > > - /* If device failover is pending, just set device state and return. > - * Device operation will be handled by reset routine. > + ASSERT_RTNL(); > + > + /* If device failover is pending or we are about to reset, just set > + * device state and return. Device operation will be handled by reset > + * routine. > + * > + * It should be safe to overwrite the adapter->state here. Since > + * we hold the rtnl, either the reset has not actually started or > + * the rtnl got dropped during the set_link_state() in do_reset(). > + * In the former case, no one else is changing the state (again we > + * have the rtnl) and in the latter case, do_reset() will detect and > + * honor our setting below. > */ > - if (adapter->failover_pending) { > + if (adapter->failover_pending || (test_bit(0, &adapter->resetting))) { > + netdev_dbg(netdev, "[S:%d FOP:%d] Resetting, deferring open\n", > + adapter->state, adapter->failover_pending); > adapter->state = VNIC_OPEN; > - return 0; > + rc = 0; > + goto out; > } > > if (adapter->state != VNIC_CLOSED) { > @@ -1221,11 +1234,12 @@ static int ibmvnic_open(struct net_device *netdev) > rc = __ibmvnic_open(netdev); > > out: > - /* > - * If open fails due to a pending failover, set device state and > - * return. Device operation will be handled by reset routine. > + /* If open failed and there is a pending failover or in-progress reset, > + * set device state and return. Device operation will be handled by > + * reset routine. See also comments above regarding rtnl. > */ > - if (rc && adapter->failover_pending) { > + if (rc && > + (adapter->failover_pending || (test_bit(0, &adapter->resetting)))) { > adapter->state = VNIC_OPEN; > rc = 0; > } > @@ -1939,6 +1953,14 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter, > netdev_dbg(adapter->netdev, "Change param resetting driver (%d)\n", > rwi->reset_reason); > > + /* read the state and check (again) after getting rtnl */ > + reset_state = adapter->state; > + > + if (reset_state == VNIC_REMOVING || reset_state == VNIC_REMOVED) { > + rc = -EBUSY; > + goto out; > + } > + The changes in do_change_param_reset will cause merge conflict with net-next tree since do_change_param_reset is no longer there in net-next tree. I suggest sending this patch after net-next merges into mainline, which should be soon. > netif_carrier_off(netdev); > adapter->reset_reason = rwi->reset_reason; > > @@ -2037,6 +2059,14 @@ static int do_reset(struct ibmvnic_adapter *adapter, > if (rwi->reset_reason == VNIC_RESET_FAILOVER) > adapter->failover_pending = false; > > + /* read the state and check (again) after getting rtnl */ > + reset_state = adapter->state; > + > + if (reset_state == VNIC_REMOVING || reset_state == VNIC_REMOVED) { > + rc = -EBUSY; > + goto out; > + } > + > netif_carrier_off(netdev); > adapter->reset_reason = rwi->reset_reason; > > @@ -2063,7 +2093,24 @@ static int do_reset(struct ibmvnic_adapter *adapter, > if (rc) > goto out; > > + if (adapter->state == VNIC_OPEN) { > + /* When we dropped rtnl, ibmvnic_open() got > + * it and noticed that we are resetting and > + * set the adapter state to OPEN. Update our > + * new "target" state, and resume the reset > + * from VNIC_CLOSING state. > + */ > + netdev_dbg(netdev, > + "Open changed state from %d, updating.\n", > + reset_state); > + reset_state = VNIC_OPEN; > + adapter->state = VNIC_CLOSING; > + } > + > if (adapter->state != VNIC_CLOSING) { > + /* If someone else changed the adapter state > + * when we dropped the rtnl, fail the reset > + */ > rc = -1; > goto out; > } > @@ -2197,6 +2244,14 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter, > netdev_dbg(adapter->netdev, "Hard resetting driver (%d)\n", > rwi->reset_reason); > > + /* read the state and check (again) after getting rtnl */ > + reset_state = adapter->state; > + > + if (reset_state == VNIC_REMOVING || reset_state == VNIC_REMOVED) { > + rc = -EBUSY; > + goto out; > + } > + > netif_carrier_off(netdev); > adapter->reset_reason = rwi->reset_reason; > > -- > 2.26.2 >
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index a536fdbf05e1..96c2b0985484 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1197,12 +1197,25 @@ static int ibmvnic_open(struct net_device *netdev) struct ibmvnic_adapter *adapter = netdev_priv(netdev); int rc; - /* If device failover is pending, just set device state and return. - * Device operation will be handled by reset routine. + ASSERT_RTNL(); + + /* If device failover is pending or we are about to reset, just set + * device state and return. Device operation will be handled by reset + * routine. + * + * It should be safe to overwrite the adapter->state here. Since + * we hold the rtnl, either the reset has not actually started or + * the rtnl got dropped during the set_link_state() in do_reset(). + * In the former case, no one else is changing the state (again we + * have the rtnl) and in the latter case, do_reset() will detect and + * honor our setting below. */ - if (adapter->failover_pending) { + if (adapter->failover_pending || (test_bit(0, &adapter->resetting))) { + netdev_dbg(netdev, "[S:%d FOP:%d] Resetting, deferring open\n", + adapter->state, adapter->failover_pending); adapter->state = VNIC_OPEN; - return 0; + rc = 0; + goto out; } if (adapter->state != VNIC_CLOSED) { @@ -1221,11 +1234,12 @@ static int ibmvnic_open(struct net_device *netdev) rc = __ibmvnic_open(netdev); out: - /* - * If open fails due to a pending failover, set device state and - * return. Device operation will be handled by reset routine. + /* If open failed and there is a pending failover or in-progress reset, + * set device state and return. Device operation will be handled by + * reset routine. See also comments above regarding rtnl. */ - if (rc && adapter->failover_pending) { + if (rc && + (adapter->failover_pending || (test_bit(0, &adapter->resetting)))) { adapter->state = VNIC_OPEN; rc = 0; } @@ -1939,6 +1953,14 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter, netdev_dbg(adapter->netdev, "Change param resetting driver (%d)\n", rwi->reset_reason); + /* read the state and check (again) after getting rtnl */ + reset_state = adapter->state; + + if (reset_state == VNIC_REMOVING || reset_state == VNIC_REMOVED) { + rc = -EBUSY; + goto out; + } + netif_carrier_off(netdev); adapter->reset_reason = rwi->reset_reason; @@ -2037,6 +2059,14 @@ static int do_reset(struct ibmvnic_adapter *adapter, if (rwi->reset_reason == VNIC_RESET_FAILOVER) adapter->failover_pending = false; + /* read the state and check (again) after getting rtnl */ + reset_state = adapter->state; + + if (reset_state == VNIC_REMOVING || reset_state == VNIC_REMOVED) { + rc = -EBUSY; + goto out; + } + netif_carrier_off(netdev); adapter->reset_reason = rwi->reset_reason; @@ -2063,7 +2093,24 @@ static int do_reset(struct ibmvnic_adapter *adapter, if (rc) goto out; + if (adapter->state == VNIC_OPEN) { + /* When we dropped rtnl, ibmvnic_open() got + * it and noticed that we are resetting and + * set the adapter state to OPEN. Update our + * new "target" state, and resume the reset + * from VNIC_CLOSING state. + */ + netdev_dbg(netdev, + "Open changed state from %d, updating.\n", + reset_state); + reset_state = VNIC_OPEN; + adapter->state = VNIC_CLOSING; + } + if (adapter->state != VNIC_CLOSING) { + /* If someone else changed the adapter state + * when we dropped the rtnl, fail the reset + */ rc = -1; goto out; } @@ -2197,6 +2244,14 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter, netdev_dbg(adapter->netdev, "Hard resetting driver (%d)\n", rwi->reset_reason); + /* read the state and check (again) after getting rtnl */ + reset_state = adapter->state; + + if (reset_state == VNIC_REMOVING || reset_state == VNIC_REMOVED) { + rc = -EBUSY; + goto out; + } + netif_carrier_off(netdev); adapter->reset_reason = rwi->reset_reason;