diff mbox

[v2,2/2] mwifiex: check hw_status in suspend and resume handlers

Message ID 1475600905-2997-2-git-send-email-akarwar@marvell.com (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show

Commit Message

Amitkumar Karwar Oct. 4, 2016, 5:08 p.m. UTC
From: Xinming Hu <huxm@marvell.com>

We have observed a kernel crash when system immediately suspends
after booting. There is a race between suspend and driver
initialization paths.
This patch adds hw_status checks to fix the problem

Signed-off-by: Xinming Hu <huxm@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
---
v2: Return failure in suspend/resume handler in this scenario.
---
 drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Comments

Brian Norris Oct. 4, 2016, 9:04 p.m. UTC | #1
Hi,

On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> From: Xinming Hu <huxm@marvell.com>
> 
> We have observed a kernel crash when system immediately suspends
> after booting. There is a race between suspend and driver
> initialization paths.
> This patch adds hw_status checks to fix the problem
> 
> Signed-off-by: Xinming Hu <huxm@marvell.com>
> Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> ---
> v2: Return failure in suspend/resume handler in this scenario.
> ---
>  drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index ba9e068..fa6bf85 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device *dev)
>  
>  	if (pdev) {
>  		card = pci_get_drvdata(pdev);
> -		if (!card || !card->adapter) {
> +		if (!card || !card->adapter ||
> +		    card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {

Wait, is there no locking on the 'hw_status' field? That is inherently
an unsafe race all on its own; you're not guaranteed that this will be
read/written atomically. And you also aren't guaranteed that writes to
this happen in the order they appear in the code -- in other words,
reading this flag doesn't necessarily guarantee that initialization is
actually complete (even if that's very likely to be true, given that
it's probably just a single-instruction word-access, and any prior HW
polling or interrupts likely have done some synchronization and can't be
reordered).

This is probably better than nothing, but it's not very good.

>  			pr_err("Card or adapter structure is not valid\n");
> -			return 0;
> +			return -EBUSY;

So the above cases all mean that the driver hasn't finished loading,
right?

!card => can't happen (PCIe probe() would have failed
mwifiex_add_card()), but fine to check

!card->adapter => only happens after patch 1; i.e., when tearing down
the device and detaching it from the driver

card->adapter->hw_status != MWIFIEX_HW_STATUS_READY => FW is not loaded
(i.e., in the process of starting or stopping FW?)

I guess all of those cases make sense to be -EBUSY.

>  		}
>  	} else {
>  		pr_err("PCIE device is not specified\n");

I was going to complain about this branch (!pdev) returning 0, since
that looked asymmetric. But this case will never happen, actually, since
to_pci_dev() is just doing struct offset arithmetic on struct device,
and struct device is never going to be NULL. It probably makes more
sense to just remove this branch entirely, but that's for another patch.

> @@ -166,9 +167,10 @@ static int mwifiex_pcie_resume(struct device *dev)
>  
>  	if (pdev) {
>  		card = pci_get_drvdata(pdev);
> -		if (!card || !card->adapter) {
> +		if (!card || !card->adapter ||
> +		    card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {

Same complaint, except if you've screwed up here, you probably already
screwed up in suspend().

Brian

>  			pr_err("Card or adapter structure is not valid\n");
> -			return 0;
> +			return -EBUSY;
>  		}
>  	} else {
>  		pr_err("PCIE device is not specified\n");
> -- 
> 1.9.1
>
Amitkumar Karwar Oct. 5, 2016, 12:26 p.m. UTC | #2
Hi Brian,

Thanks for your thorough review.

> From: Brian Norris [mailto:briannorris@chromium.org]
> Sent: Wednesday, October 05, 2016 2:35 AM
> To: Amitkumar Karwar
> Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam;
> rajatja@google.com; briannorris@google.com; Xinming Hu
> Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and
> resume handlers
> 
> Hi,
> 
> On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> > From: Xinming Hu <huxm@marvell.com>
> >
> > We have observed a kernel crash when system immediately suspends after
> > booting. There is a race between suspend and driver initialization
> > paths.
> > This patch adds hw_status checks to fix the problem
> >
> > Signed-off-by: Xinming Hu <huxm@marvell.com>
> > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> > ---
> > v2: Return failure in suspend/resume handler in this scenario.
> > ---
> >  drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > index ba9e068..fa6bf85 100644
> > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device
> > *dev)
> >
> >  	if (pdev) {
> >  		card = pci_get_drvdata(pdev);
> > -		if (!card || !card->adapter) {
> > +		if (!card || !card->adapter ||
> > +		    card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
> 
> Wait, is there no locking on the 'hw_status' field? That is inherently
> an unsafe race all on its own; you're not guaranteed that this will be
> read/written atomically. And you also aren't guaranteed that writes to
> this happen in the order they appear in the code -- in other words,
> reading this flag doesn't necessarily guarantee that initialization is
> actually complete (even if that's very likely to be true, given that
> it's probably just a single-instruction word-access, and any prior HW
> polling or interrupts likely have done some synchronization and can't be
> reordered).
> actually complete

Here is the brief info on how "hw_status" flag is updated.
1) It gets changed incrementally during initialization.
    MWIFIEX_HW_STATUS_INITIALIZING -> MWIFIEX_HW_STATUS_INIT_DONE -> MWIFIEX_HW_STATUS_READY

2) Status will remain READY once driver+firmware is up and running.

3) Below is status during teardown
MWIFIEX_HW_STATUS_READY -> MWIFIEX_HW_STATUS_RESET -> MWIFIEX_HW_STATUS_CLOSING -> MWIFIEX_HW_STATUS_NOT_READY

As the events occur one after another, we don't expect a race and don't need locking here for the flag. Flag status MWIFIEX_HW_STATUS_READY guarantees that initialization is completed.

In worst case scenario, only first system suspend attempt issued immediately after system boot will be aborted with BUSY error. I think, that should be fine.

Let me know if you have any concerns.

> 
> This is probably better than nothing, but it's not very good.
> 
> >  			pr_err("Card or adapter structure is not valid\n");
> > -			return 0;
> > +			return -EBUSY;
> 
> So the above cases all mean that the driver hasn't finished loading,
> right?
> 
> !card => can't happen (PCIe probe() would have failed
> mwifiex_add_card()), but fine to check

NULL "card" or "card->adapter" pointers are expected in below cases
1) Driver initialization failed after downloading the firmware. Driver is performing cleanup in init failure path. Now system suspends.
2) Race of teardown + suspend
 
> 
> !card->adapter => only happens after patch 1; i.e., when tearing down
> the device and detaching it from the driver
> 
> card->adapter->hw_status != MWIFIEX_HW_STATUS_READY => FW is not loaded
> (i.e., in the process of starting or stopping FW?)
> 
> I guess all of those cases make sense to be -EBUSY.

Yes. we can keep -EBUSY for all of the cases.

> 
> >  		}
> >  	} else {
> >  		pr_err("PCIE device is not specified\n");
> 
> I was going to complain about this branch (!pdev) returning 0, since
> that looked asymmetric. But this case will never happen, actually, since
> to_pci_dev() is just doing struct offset arithmetic on struct device,
> and struct device is never going to be NULL. It probably makes more
> sense to just remove this branch entirely, but that's for another patch.

Thanks for pointing this out. I will create separate patch for this.

> 
> > @@ -166,9 +167,10 @@ static int mwifiex_pcie_resume(struct device
> > *dev)
> >
> >  	if (pdev) {
> >  		card = pci_get_drvdata(pdev);
> > -		if (!card || !card->adapter) {
> > +		if (!card || !card->adapter ||
> > +		    card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
> 
> Same complaint, except if you've screwed up here, you probably already
> screwed up in suspend().
> 
> Brian
> 
> >  			pr_err("Card or adapter structure is not valid\n");
> > -			return 0;
> > +			return -EBUSY;
> >  		}
> >  	} else {
> >  		pr_err("PCIE device is not specified\n");
> > --
> > 1.9.1
> >

Regards,
Amitkumar
Brian Norris Oct. 5, 2016, 4:41 p.m. UTC | #3
Hi Amit,

On Wed, Oct 05, 2016 at 12:26:36PM +0000, Amitkumar Karwar wrote:
> > From: Brian Norris [mailto:briannorris@chromium.org]
> > Sent: Wednesday, October 05, 2016 2:35 AM
> > To: Amitkumar Karwar
> > Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam;
> > rajatja@google.com; briannorris@google.com; Xinming Hu
> > Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and
> > resume handlers
> > 
> > On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> > > From: Xinming Hu <huxm@marvell.com>
> > >
> > > We have observed a kernel crash when system immediately suspends after
> > > booting. There is a race between suspend and driver initialization
> > > paths.
> > > This patch adds hw_status checks to fix the problem
> > >
> > > Signed-off-by: Xinming Hu <huxm@marvell.com>
> > > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> > > ---
> > > v2: Return failure in suspend/resume handler in this scenario.
> > > ---
> > >  drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > index ba9e068..fa6bf85 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device
> > > *dev)
> > >
> > >  	if (pdev) {
> > >  		card = pci_get_drvdata(pdev);
> > > -		if (!card || !card->adapter) {
> > > +		if (!card || !card->adapter ||
> > > +		    card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
> > 
> > Wait, is there no locking on the 'hw_status' field? That is inherently
> > an unsafe race all on its own; you're not guaranteed that this will be
> > read/written atomically. And you also aren't guaranteed that writes to
> > this happen in the order they appear in the code -- in other words,
> > reading this flag doesn't necessarily guarantee that initialization is
> > actually complete (even if that's very likely to be true, given that
> > it's probably just a single-instruction word-access, and any prior HW
> > polling or interrupts likely have done some synchronization and can't be
> > reordered).
> > actually complete
> 
> Here is the brief info on how "hw_status" flag is updated.
> 1) It gets changed incrementally during initialization.
>     MWIFIEX_HW_STATUS_INITIALIZING -> MWIFIEX_HW_STATUS_INIT_DONE -> MWIFIEX_HW_STATUS_READY
> 
> 2) Status will remain READY once driver+firmware is up and running.
> 
> 3) Below is status during teardown
> MWIFIEX_HW_STATUS_READY -> MWIFIEX_HW_STATUS_RESET -> MWIFIEX_HW_STATUS_CLOSING -> MWIFIEX_HW_STATUS_NOT_READY
> 
> As the events occur one after another, we don't expect a race and
> don't need locking here for the flag. Flag status
> MWIFIEX_HW_STATUS_READY guarantees that initialization is completed.

It seems like, as with patch 1, you're mostly arguing about the writes
to this variable. But writes race with reads as well; how do you
guarantee that you're not seeing incorrect values of 'hw_status' here in
suspend() -- e.g., either old or new values of it, or even
partially-written values, if for some reason the compiler decides it
can't read/write this all in one go?

> In worst case scenario, only first system suspend attempt issued
> immediately after system boot will be aborted with BUSY error. I
> think, that should be fine.

(For the record, my concern about -EBUSY is separate from my concern
about the potential race condition.)

> Let me know if you have any concerns.

Sorry, I probably didn't completely flesh out my thought here.

I think I was concerned about a failed initialization causing the system
to never enter suspend again. So specifically: what happens if (e.g.)
the firmware fails to load? AFAICT, the device doesn't actually unbind
itself from the driver, so instead, you have a device in limbo that will
always return -EBUSY in suspend(), and your system can never again enter
suspend. Am I correct? If so, that doesn't sound great.

> > This is probably better than nothing, but it's not very good.
> > 
> > >  			pr_err("Card or adapter structure is not valid\n");
> > > -			return 0;
> > > +			return -EBUSY;
> > 
> > So the above cases all mean that the driver hasn't finished loading,
> > right?
> > 
> > !card => can't happen (PCIe probe() would have failed
> > mwifiex_add_card()), but fine to check
> 
> NULL "card" or "card->adapter" pointers are expected in below cases
> 1) Driver initialization failed after downloading the firmware. Driver is performing cleanup in init failure path. Now system suspends.
> 2) Race of teardown + suspend
>  
> > 
> > !card->adapter => only happens after patch 1; i.e., when tearing down
> > the device and detaching it from the driver
> > 
> > card->adapter->hw_status != MWIFIEX_HW_STATUS_READY => FW is not loaded
> > (i.e., in the process of starting or stopping FW?)
> > 
> > I guess all of those cases make sense to be -EBUSY.

^^ I'm second-guessing my claim here then.

> Yes. we can keep -EBUSY for all of the cases.

Brian
Cathy Luo Oct. 5, 2016, 5:19 p.m. UTC | #4
Hi Brian


I think your concern is right, we will update patch to handle the case you mentioned below. 

If our firmware is dead or we init failure, we should allow the system to suspend. 

We have following hardware status supported in driver. We should return -EBUSY only when hardware status is 
MWIFIEX_HW_STATUS_INITIALIZING/MWIFIEX_HW_STATUS_INIT_DONE/ MWIFIEX_HW_STATUS_CLOSING. 

enum MWIFIEX_HARDWARE_STATUS {
	MWIFIEX_HW_STATUS_READY,
	MWIFIEX_HW_STATUS_INITIALIZING,
	MWIFIEX_HW_STATUS_INIT_DONE,
	MWIFIEX_HW_STATUS_RESET,
	MWIFIEX_HW_STATUS_CLOSING,
	MWIFIEX_HW_STATUS_NOT_READY
};

Amit will help prepare the new patch to handle this. 

Regards

Cathy

-----Original Message-----
From: Brian Norris [mailto:briannorris@chromium.org] 
Sent: Wednesday, October 05, 2016 9:42 AM
To: Amitkumar Karwar
Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam; rajatja@google.com; Xinming Hu
Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and resume handlers

Hi Amit,

On Wed, Oct 05, 2016 at 12:26:36PM +0000, Amitkumar Karwar wrote:
> > From: Brian Norris [mailto:briannorris@chromium.org]
> > Sent: Wednesday, October 05, 2016 2:35 AM
> > To: Amitkumar Karwar
> > Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam; 
> > rajatja@google.com; briannorris@google.com; Xinming Hu
> > Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and 
> > resume handlers
> > 
> > On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> > > From: Xinming Hu <huxm@marvell.com>
> > >
> > > We have observed a kernel crash when system immediately suspends 
> > > after booting. There is a race between suspend and driver 
> > > initialization paths.
> > > This patch adds hw_status checks to fix the problem
> > >
> > > Signed-off-by: Xinming Hu <huxm@marvell.com>
> > > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> > > ---
> > > v2: Return failure in suspend/resume handler in this scenario.
> > > ---
> > >  drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > index ba9e068..fa6bf85 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device
> > > *dev)
> > >
> > >  	if (pdev) {
> > >  		card = pci_get_drvdata(pdev);
> > > -		if (!card || !card->adapter) {
> > > +		if (!card || !card->adapter ||
> > > +		    card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
> > 
> > Wait, is there no locking on the 'hw_status' field? That is 
> > inherently an unsafe race all on its own; you're not guaranteed that 
> > this will be read/written atomically. And you also aren't guaranteed 
> > that writes to this happen in the order they appear in the code -- 
> > in other words, reading this flag doesn't necessarily guarantee that 
> > initialization is actually complete (even if that's very likely to 
> > be true, given that it's probably just a single-instruction 
> > word-access, and any prior HW polling or interrupts likely have done 
> > some synchronization and can't be reordered).
> > actually complete
> 
> Here is the brief info on how "hw_status" flag is updated.
> 1) It gets changed incrementally during initialization.
>     MWIFIEX_HW_STATUS_INITIALIZING -> MWIFIEX_HW_STATUS_INIT_DONE -> 
> MWIFIEX_HW_STATUS_READY
> 
> 2) Status will remain READY once driver+firmware is up and running.
> 
> 3) Below is status during teardown
> MWIFIEX_HW_STATUS_READY -> MWIFIEX_HW_STATUS_RESET -> 
> MWIFIEX_HW_STATUS_CLOSING -> MWIFIEX_HW_STATUS_NOT_READY
> 
> As the events occur one after another, we don't expect a race and 
> don't need locking here for the flag. Flag status 
> MWIFIEX_HW_STATUS_READY guarantees that initialization is completed.

It seems like, as with patch 1, you're mostly arguing about the writes to this variable. But writes race with reads as well; how do you guarantee that you're not seeing incorrect values of 'hw_status' here in
suspend() -- e.g., either old or new values of it, or even partially-written values, if for some reason the compiler decides it can't read/write this all in one go?

> In worst case scenario, only first system suspend attempt issued 
> immediately after system boot will be aborted with BUSY error. I 
> think, that should be fine.

(For the record, my concern about -EBUSY is separate from my concern about the potential race condition.)

> Let me know if you have any concerns.

Sorry, I probably didn't completely flesh out my thought here.

I think I was concerned about a failed initialization causing the system to never enter suspend again. So specifically: what happens if (e.g.) the firmware fails to load? AFAICT, the device doesn't actually unbind itself from the driver, so instead, you have a device in limbo that will always return -EBUSY in suspend(), and your system can never again enter suspend. Am I correct? If so, that doesn't sound great.

> > This is probably better than nothing, but it's not very good.
> > 
> > >  			pr_err("Card or adapter structure is not valid\n");
> > > -			return 0;
> > > +			return -EBUSY;
> > 
> > So the above cases all mean that the driver hasn't finished loading, 
> > right?
> > 
> > !card => can't happen (PCIe probe() would have failed 
> > mwifiex_add_card()), but fine to check
> 
> NULL "card" or "card->adapter" pointers are expected in below cases
> 1) Driver initialization failed after downloading the firmware. Driver is performing cleanup in init failure path. Now system suspends.
> 2) Race of teardown + suspend
>  
> > 
> > !card->adapter => only happens after patch 1; i.e., when tearing 
> > down the device and detaching it from the driver
> > 
> > card->adapter->hw_status != MWIFIEX_HW_STATUS_READY => FW is not 
> > card->adapter->loaded
> > (i.e., in the process of starting or stopping FW?)
> > 
> > I guess all of those cases make sense to be -EBUSY.

^^ I'm second-guessing my claim here then.

> Yes. we can keep -EBUSY for all of the cases.

Brian
Amitkumar Karwar Oct. 6, 2016, 5:28 p.m. UTC | #5
Hi Brain,

> From: Brian Norris [mailto:briannorris@chromium.org]
> Sent: Wednesday, October 05, 2016 10:12 PM
> To: Amitkumar Karwar
> Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam;
> rajatja@google.com; Xinming Hu
> Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and
> resume handlers
> 
> Hi Amit,
> 
> On Wed, Oct 05, 2016 at 12:26:36PM +0000, Amitkumar Karwar wrote:
> > > From: Brian Norris [mailto:briannorris@chromium.org]
> > > Sent: Wednesday, October 05, 2016 2:35 AM
> > > To: Amitkumar Karwar
> > > Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam;
> > > rajatja@google.com; briannorris@google.com; Xinming Hu
> > > Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and
> > > resume handlers
> > >
> > > On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> > > > From: Xinming Hu <huxm@marvell.com>
> > > >
> > > > We have observed a kernel crash when system immediately suspends
> > > > after booting. There is a race between suspend and driver
> > > > initialization paths.
> > > > This patch adds hw_status checks to fix the problem
> > > >
> > > > Signed-off-by: Xinming Hu <huxm@marvell.com>
> > > > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> > > > ---
> > > > v2: Return failure in suspend/resume handler in this scenario.
> > > > ---
> > > >  drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
> > > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > > index ba9e068..fa6bf85 100644
> > > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > > @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device
> > > > *dev)
> > > >
> > > >  	if (pdev) {
> > > >  		card = pci_get_drvdata(pdev);
> > > > -		if (!card || !card->adapter) {
> > > > +		if (!card || !card->adapter ||
> > > > +		    card->adapter->hw_status !=
> MWIFIEX_HW_STATUS_READY) {
> > >
> > > Wait, is there no locking on the 'hw_status' field? That is
> > > inherently an unsafe race all on its own; you're not guaranteed that
> > > this will be read/written atomically. And you also aren't guaranteed
> > > that writes to this happen in the order they appear in the code --
> > > in other words, reading this flag doesn't necessarily guarantee that
> > > initialization is actually complete (even if that's very likely to
> > > be true, given that it's probably just a single-instruction
> > > word-access, and any prior HW polling or interrupts likely have done
> > > some synchronization and can't be reordered).
> > > actually complete
> >
> > Here is the brief info on how "hw_status" flag is updated.
> > 1) It gets changed incrementally during initialization.
> >     MWIFIEX_HW_STATUS_INITIALIZING -> MWIFIEX_HW_STATUS_INIT_DONE ->
> > MWIFIEX_HW_STATUS_READY
> >
> > 2) Status will remain READY once driver+firmware is up and running.
> >
> > 3) Below is status during teardown
> > MWIFIEX_HW_STATUS_READY -> MWIFIEX_HW_STATUS_RESET ->
> > MWIFIEX_HW_STATUS_CLOSING -> MWIFIEX_HW_STATUS_NOT_READY
> >
> > As the events occur one after another, we don't expect a race and
> > don't need locking here for the flag. Flag status
> > MWIFIEX_HW_STATUS_READY guarantees that initialization is completed.
> 
> It seems like, as with patch 1, you're mostly arguing about the writes
> to this variable. But writes race with reads as well; how do you
> guarantee that you're not seeing incorrect values of 'hw_status' here in
> suspend() -- e.g., either old or new values of it, or even partially-
> written values, if for some reason the compiler decides it can't
> read/write this all in one go?

Got it. I think, we need to define "hw_status" as atomic variable to resolve the issue. I will create separate patch for this.

> 
> > In worst case scenario, only first system suspend attempt issued
> > immediately after system boot will be aborted with BUSY error. I
> > think, that should be fine.
> 
> (For the record, my concern about -EBUSY is separate from my concern
> about the potential race condition.)
> 
> > Let me know if you have any concerns.
> 
> Sorry, I probably didn't completely flesh out my thought here.
> 
> I think I was concerned about a failed initialization causing the system
> to never enter suspend again. So specifically: what happens if (e.g.)
> the firmware fails to load? AFAICT, the device doesn't actually unbind
> itself from the driver, so instead, you have a device in limbo that will
> always return -EBUSY in suspend(), and your system can never again enter
> suspend. Am I correct? If so, that doesn't sound great.

You are right. I will add an extra check for this so that both the cases would be handled
Case 1:
	Firmware has gone bad or has failed to initialize. We will return zero here, so that it won't block subsequent suspend attempts.
Case 2:
	Init or teardown is in process. We will return -EBUSY here. Next suspend attempt would be successful.

I will submit v3 with these changes shortly.

Regards,
Amitkumar
diff mbox

Patch

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index ba9e068..fa6bf85 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -122,9 +122,10 @@  static int mwifiex_pcie_suspend(struct device *dev)
 
 	if (pdev) {
 		card = pci_get_drvdata(pdev);
-		if (!card || !card->adapter) {
+		if (!card || !card->adapter ||
+		    card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
 			pr_err("Card or adapter structure is not valid\n");
-			return 0;
+			return -EBUSY;
 		}
 	} else {
 		pr_err("PCIE device is not specified\n");
@@ -166,9 +167,10 @@  static int mwifiex_pcie_resume(struct device *dev)
 
 	if (pdev) {
 		card = pci_get_drvdata(pdev);
-		if (!card || !card->adapter) {
+		if (!card || !card->adapter ||
+		    card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
 			pr_err("Card or adapter structure is not valid\n");
-			return 0;
+			return -EBUSY;
 		}
 	} else {
 		pr_err("PCIE device is not specified\n");