mbox series

[net-next,v2,0/7] ibmvnic: Use more consistent locking

Message ID 20210112181441.206545-1-sukadev@linux.ibm.com (mailing list archive)
Headers show
Series ibmvnic: Use more consistent locking | expand

Message

Sukadev Bhattiprolu Jan. 12, 2021, 6:14 p.m. UTC
Use more consistent locking when reading/writing the adapter->state
field. This patch set fixes a race condition during ibmvnic_open()
where the adapter could be left in the PROBED state if a reset occurs
at the wrong time. This can cause networking to not come up during
boot and potentially require manual intervention in bringing up
applications that depend on the network.

Changelog[v2] [Address comments from Jakub Kicinski]
	- Fix up commit log for patch 5/7 and drop unnecessary variable
	- Format Fixes line properly (no wrapping, no blank lines)

Sukadev Bhattiprolu (7):
  ibmvnic: restore state in change-param reset
  ibmvnic: update reset function prototypes
  ibmvnic: avoid allocating rwi entries
  ibmvnic: switch order of checks in ibmvnic_reset
  ibmvnic: serialize access to work queue
  ibmvnic: check adapter->state under state_lock
  ibmvnic: add comments about state_lock

 drivers/net/ethernet/ibm/ibmvnic.c | 347 ++++++++++++++++++++---------
 drivers/net/ethernet/ibm/ibmvnic.h |  70 +++++-
 2 files changed, 306 insertions(+), 111 deletions(-)

Comments

Saeed Mahameed Jan. 12, 2021, 8 p.m. UTC | #1
On Tue, 2021-01-12 at 10:14 -0800, Sukadev Bhattiprolu wrote:
> Use more consistent locking when reading/writing the adapter->state
> field. This patch set fixes a race condition during ibmvnic_open()
> where the adapter could be left in the PROBED state if a reset occurs
> at the wrong time. This can cause networking to not come up during
> boot and potentially require manual intervention in bringing up
> applications that depend on the network.
> 
> Changelog[v2] [Address comments from Jakub Kicinski]
> 	- Fix up commit log for patch 5/7 and drop unnecessary variable
> 	- Format Fixes line properly (no wrapping, no blank lines)
> 
> Sukadev Bhattiprolu (7):
>   ibmvnic: restore state in change-param reset
>   ibmvnic: update reset function prototypes
>   ibmvnic: avoid allocating rwi entries
>   ibmvnic: switch order of checks in ibmvnic_reset
>   ibmvnic: serialize access to work queue
>   ibmvnic: check adapter->state under state_lock
>   ibmvnic: add comments about state_lock
> 
> 
Other than the two minor comments, 
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Jakub Kicinski Jan. 13, 2021, 2 a.m. UTC | #2
On Tue, 12 Jan 2021 10:14:34 -0800 Sukadev Bhattiprolu wrote:
> Use more consistent locking when reading/writing the adapter->state
> field. This patch set fixes a race condition during ibmvnic_open()
> where the adapter could be left in the PROBED state if a reset occurs
> at the wrong time. This can cause networking to not come up during
> boot and potentially require manual intervention in bringing up
> applications that depend on the network.

Apologies for not having enough time to suggest details, but let me
state this again - the patches which fix bugs need to go into net with
Fixes tags, the refactoring needs to go to net-next without Fixes tags.
If there are dependencies, patches go to net first, then within a week
or so the reset can be posted for net-next, after net -> net-next merge.
Sukadev Bhattiprolu Jan. 13, 2021, 4:42 a.m. UTC | #3
Jakub Kicinski [kuba@kernel.org] wrote:
> On Tue, 12 Jan 2021 10:14:34 -0800 Sukadev Bhattiprolu wrote:
> > Use more consistent locking when reading/writing the adapter->state
> > field. This patch set fixes a race condition during ibmvnic_open()
> > where the adapter could be left in the PROBED state if a reset occurs
> > at the wrong time. This can cause networking to not come up during
> > boot and potentially require manual intervention in bringing up
> > applications that depend on the network.
> 
> Apologies for not having enough time to suggest details, but let me
> state this again - the patches which fix bugs need to go into net with
> Fixes tags, the refactoring needs to go to net-next without Fixes tags.
> If there are dependencies, patches go to net first, then within a week
> or so the reset can be posted for net-next, after net -> net-next merge.

Well, the patch set fixes a set of bugs - main one is a locking bug fixed
in patch 6. Other bugs are more minor or corner cases. Fixing the locking
bug requires some refactoring/simplifying/reordering checks so lock can be
properly acquired.

Because of the size/cleanup, should we treat it as "next" material? i.e
should I just drop the Fixes tag and resend to net-next?

Or can we ignore the size of patchset and treat it all as bug fixes?

Appreciate your input.

Thanks,

Sukadev
Jakub Kicinski Jan. 13, 2021, 6:57 p.m. UTC | #4
On Tue, 12 Jan 2021 20:42:47 -0800 Sukadev Bhattiprolu wrote:
> Jakub Kicinski [kuba@kernel.org] wrote:
> > On Tue, 12 Jan 2021 10:14:34 -0800 Sukadev Bhattiprolu wrote:  
> > > Use more consistent locking when reading/writing the adapter->state
> > > field. This patch set fixes a race condition during ibmvnic_open()
> > > where the adapter could be left in the PROBED state if a reset occurs
> > > at the wrong time. This can cause networking to not come up during
> > > boot and potentially require manual intervention in bringing up
> > > applications that depend on the network.  
> > 
> > Apologies for not having enough time to suggest details, but let me
> > state this again - the patches which fix bugs need to go into net with
> > Fixes tags, the refactoring needs to go to net-next without Fixes tags.
> > If there are dependencies, patches go to net first, then within a week
> > or so the reset can be posted for net-next, after net -> net-next merge.  
> 
> Well, the patch set fixes a set of bugs - main one is a locking bug fixed
> in patch 6. Other bugs are more minor or corner cases. Fixing the locking
> bug requires some refactoring/simplifying/reordering checks so lock can be
> properly acquired.
> 
> Because of the size/cleanup, should we treat it as "next" material? i.e
> should I just drop the Fixes tag and resend to net-next?
> 
> Or can we ignore the size of patchset and treat it all as bug fixes?

No, focus on doing this right rather than trying to justify why your
patches deserve special treatment.

Throw this entire series out and start over with the goal of fixing 
the bugs with minimal patches.
Sukadev Bhattiprolu Jan. 13, 2021, 7:55 p.m. UTC | #5
Jakub Kicinski [kuba@kernel.org] wrote:
> On Tue, 12 Jan 2021 20:42:47 -0800 Sukadev Bhattiprolu wrote:
> > Jakub Kicinski [kuba@kernel.org] wrote:
> > > On Tue, 12 Jan 2021 10:14:34 -0800 Sukadev Bhattiprolu wrote:  
> > > > Use more consistent locking when reading/writing the adapter->state
> > > > field. This patch set fixes a race condition during ibmvnic_open()
> > > > where the adapter could be left in the PROBED state if a reset occurs
> > > > at the wrong time. This can cause networking to not come up during
> > > > boot and potentially require manual intervention in bringing up
> > > > applications that depend on the network.  
> > > 
> > > Apologies for not having enough time to suggest details, but let me
> > > state this again - the patches which fix bugs need to go into net with
> > > Fixes tags, the refactoring needs to go to net-next without Fixes tags.
> > > If there are dependencies, patches go to net first, then within a week
> > > or so the reset can be posted for net-next, after net -> net-next merge.  
> > 
> > Well, the patch set fixes a set of bugs - main one is a locking bug fixed
> > in patch 6. Other bugs are more minor or corner cases. Fixing the locking
> > bug requires some refactoring/simplifying/reordering checks so lock can be
> > properly acquired.
> > 
> > Because of the size/cleanup, should we treat it as "next" material? i.e
> > should I just drop the Fixes tag and resend to net-next?
> > 
> > Or can we ignore the size of patchset and treat it all as bug fixes?
> 
> No, focus on doing this right rather than trying to justify why your
> patches deserve special treatment.

I am not asking for special treatment. I just don't get the rationale
for saying that a bug fix cannot have some amount of refactoring.
Specially a bug fix like this locking one which clearly touches various
parts of the code. To take the lock properly we do have to move code
around.
> 
> Throw this entire series out and start over with the goal of fixing 
> the bugs with minimal patches.

Fixing corner case bugs that have been around for a while in code that
we are going to throw away feels like spinning wheels just to comply
with the "process".

Other than the optimization for the spin lock, there have been no
reported technical issues with the patch set. Throwing away the
patch set and starting over - I would be focusing not on the bugs
or making the driver better but somehow complying with the process.

The tiny memory leak issues I mention for completeness are just rare
corner cases and a distraction. The main issue that people actually
hit is the locking one. Fixing the locking issue touches lot of code.

I to throw away the list implementation and add a couple of simple
interfaces because if the allocation fails we call ibmvnic_close() -
that messes with the locking I am trying to fix.

Sukadev
Sukadev Bhattiprolu Jan. 14, 2021, 2:35 a.m. UTC | #6
Jakub Kicinski [kuba@kernel.org] wrote:
> On Tue, 12 Jan 2021 10:14:34 -0800 Sukadev Bhattiprolu wrote:
> > Use more consistent locking when reading/writing the adapter->state
> > field. This patch set fixes a race condition during ibmvnic_open()
> > where the adapter could be left in the PROBED state if a reset occurs
> > at the wrong time. This can cause networking to not come up during
> > boot and potentially require manual intervention in bringing up
> > applications that depend on the network.
> 
> Apologies for not having enough time to suggest details, but let me
> state this again - the patches which fix bugs need to go into net with
> Fixes tags, the refactoring needs to go to net-next without Fixes tags.
> If there are dependencies, patches go to net first, then within a week
> or so the reset can be posted for net-next, after net -> net-next merge.

I think the locking bug fixes need the refactoring. So would it be ok to
send the refactoring (patches 2 through 4) first to net-next and when
they are merged send the the bug fixes (1, 5 and 6)? Patch 7 can be
sent to net-next later after that.

Thanks,

Sukadev