diff mbox

scsi: lpfc: Add shutdown method for kexec

Message ID 20170212214920.28866-1-anton@ozlabs.org (mailing list archive)
State Changes Requested, archived
Headers show

Commit Message

Anton Blanchard Feb. 12, 2017, 9:49 p.m. UTC
From: Anton Blanchard <anton@samba.org>

We see lpfc devices regularly fail during kexec. Fix this by adding
a shutdown method which mirrors the remove method.

Signed-off-by: Anton Blanchard <anton@samba.org>
---
 drivers/scsi/lpfc/lpfc_init.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Benjamin Herrenschmidt Feb. 12, 2017, 11:14 p.m. UTC | #1
On Mon, 2017-02-13 at 08:49 +1100, Anton Blanchard wrote:
> From: Anton Blanchard <anton@samba.org>
> 
> We see lpfc devices regularly fail during kexec. Fix this by adding
> a shutdown method which mirrors the remove method.

Or instead finally do what I've been advocating for years (and even
sent patches for) which is to have kexec call remove instead of
shutdown.

Shutdown is and has *always* been the wrong thing to do.

> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
>  drivers/scsi/lpfc/lpfc_init.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/scsi/lpfc/lpfc_init.c
> b/drivers/scsi/lpfc/lpfc_init.c
> index 4776fd8..10f75ad 100644
> --- a/drivers/scsi/lpfc/lpfc_init.c
> +++ b/drivers/scsi/lpfc/lpfc_init.c
> @@ -11447,6 +11447,7 @@ static struct pci_driver lpfc_driver = {
>  	.id_table	= lpfc_id_table,
>  	.probe		= lpfc_pci_probe_one,
>  	.remove		= lpfc_pci_remove_one,
> +	.shutdown	= lpfc_pci_remove_one,
>  	.suspend        = lpfc_pci_suspend_one,
>  	.resume		= lpfc_pci_resume_one,
>  	.err_handler    = &lpfc_err_handler,
Anton Blanchard Feb. 12, 2017, 11:47 p.m. UTC | #2
Hi Ben,

On Mon, 13 Feb 2017 10:14:54 +1100
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Mon, 2017-02-13 at 08:49 +1100, Anton Blanchard wrote:
> > From: Anton Blanchard <anton@samba.org>
> > 
> > We see lpfc devices regularly fail during kexec. Fix this by adding
> > a shutdown method which mirrors the remove method.  
> 
> Or instead finally do what I've been advocating for years (and even
> sent patches for) which is to have kexec call remove instead of
> shutdown.
>
> Shutdown is and has *always* been the wrong thing to do.

Good point, at the very least we should call remove if shutdown doesn't
exist. Eric: could we make the changes Ben suggests?

Anton

> > Signed-off-by: Anton Blanchard <anton@samba.org>
> > ---
> >  drivers/scsi/lpfc/lpfc_init.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/scsi/lpfc/lpfc_init.c
> > b/drivers/scsi/lpfc/lpfc_init.c
> > index 4776fd8..10f75ad 100644
> > --- a/drivers/scsi/lpfc/lpfc_init.c
> > +++ b/drivers/scsi/lpfc/lpfc_init.c
> > @@ -11447,6 +11447,7 @@ static struct pci_driver lpfc_driver = {
> >  	.id_table	= lpfc_id_table,
> >  	.probe		= lpfc_pci_probe_one,
> >  	.remove		= lpfc_pci_remove_one,
> > +	.shutdown	= lpfc_pci_remove_one,
> >  	.suspend        = lpfc_pci_suspend_one,
> >  	.resume		= lpfc_pci_resume_one,
> >  	.err_handler    = &lpfc_err_handler,  
>
Eric W. Biederman Feb. 13, 2017, 12:21 a.m. UTC | #3
Anton Blanchard <anton@samba.org> writes:

> Hi Ben,
>
> On Mon, 13 Feb 2017 10:14:54 +1100
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
>> On Mon, 2017-02-13 at 08:49 +1100, Anton Blanchard wrote:
>> > From: Anton Blanchard <anton@samba.org>
>> > 
>> > We see lpfc devices regularly fail during kexec. Fix this by adding
>> > a shutdown method which mirrors the remove method.  
>> 
>> Or instead finally do what I've been advocating for years (and even
>> sent patches for) which is to have kexec call remove instead of
>> shutdown.

Ben what happened to those patches.  I don't remember them?

>> Shutdown is and has *always* been the wrong thing to do.

I won't argue that.

> Good point, at the very least we should call remove if shutdown doesn't
> exist. Eric: could we make the changes Ben suggests?

Definitely.  That was the original design of the kexec interface
but people were worried about calling remove during reboot might
introduce regressions on the reboot functionality.  So shutdown was
added to be remove without the cleaning up the kernel data structures.

I am all for changing the core to call remove.  That seems to be a more
tested code path (because remove tends to be part of the development
path for modules) and thus much more likely to work in practice.

The challenge with changes in this area is that when the infrastructure
is changed for everyone someone needs to baby sit it until all of the
unexpected issues are resolved.  I was hoping a couple of years ago that
Ben could be that person.

Eric
Benjamin Herrenschmidt Feb. 13, 2017, 1:01 a.m. UTC | #4
On Mon, 2017-02-13 at 13:21 +1300, Eric W. Biederman wrote:
> > Good point, at the very least we should call remove if shutdown doesn't
> > exist. Eric: could we make the changes Ben suggests?
> 
> Definitely.  That was the original design of the kexec interface
> but people were worried about calling remove during reboot might
> introduce regressions on the reboot functionality.  So shutdown was
> added to be remove without the cleaning up the kernel data structures.

Right. Part of the problem was also that remove was dependent on CONFIG_HOTPLUG
though that's no longer the case anymore.

The problem is that a bunch of drivers either don't have a shutdown or
worse, have one that actually shuts the HW down rather than "idle" it
which puts it in a state that the new kernel can't deal with.

> I am all for changing the core to call remove.  That seems to be a more
> tested code path (because remove tends to be part of the development
> path for modules) and thus much more likely to work in practice.
> 
> The challenge with changes in this area is that when the infrastructure
> is changed for everyone someone needs to baby sit it until all of the
> unexpected issues are resolved.  I was hoping a couple of years ago that
> Ben could be that person.

Correct. And I lack time. But since we are a platform that uses kexec
daily as our main booting mechanism we should probably be the ones
driving that.

I had a patch at least to fallback to remove in absence of shutdown
which I can try to dig out.

We can add a config option to make it do the other way around that
we should start testing internally. I'm sure we will find 1 or 2
regressions as drivers do weird things.

Cheers,
Ben.
Brian King Feb. 13, 2017, 9:57 p.m. UTC | #5
On 02/12/2017 07:01 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2017-02-13 at 13:21 +1300, Eric W. Biederman wrote:
>>> Good point, at the very least we should call remove if shutdown doesn't
>>> exist. Eric: could we make the changes Ben suggests?
>>
>> Definitely.  That was the original design of the kexec interface
>> but people were worried about calling remove during reboot might
>> introduce regressions on the reboot functionality.  So shutdown was
>> added to be remove without the cleaning up the kernel data structures.
> 
> Right. Part of the problem was also that remove was dependent on CONFIG_HOTPLUG
> though that's no longer the case anymore.
> 
> The problem is that a bunch of drivers either don't have a shutdown or
> worse, have one that actually shuts the HW down rather than "idle" it
> which puts it in a state that the new kernel can't deal with.

If we do transition to use remove rather than shutdown, I think we want
some way for a device driver to know whether we are doing kexec or not.
A RAID adapter with a write cache is going to want to flush its write
cache on a PCI hotplug remove, but for a kexec, its going to want to skip
that so the kexec is faster. Today, since kexec looks like a reboot,
rather than a shutdown, we can skip the flush on a reboot, since its
technically not needed there either.

-Brian
Benjamin Herrenschmidt Feb. 14, 2017, 2:04 a.m. UTC | #6
On Mon, 2017-02-13 at 15:57 -0600, Brian King wrote:
> If we do transition to use remove rather than shutdown, I think we
> want
> some way for a device driver to know whether we are doing kexec or
> not.
> A RAID adapter with a write cache is going to want to flush its write
> cache on a PCI hotplug remove, but for a kexec, its going to want to
> skip
> that so the kexec is faster. Today, since kexec looks like a reboot,
> rather than a shutdown, we can skip the flush on a reboot, since its
> technically not needed there either.

What happens if a non-flushed adapter gets a PERST ?

I wouldn't trust that 'don't have to flush' magic ...

Cheers,
Ben.
Eric W. Biederman Feb. 14, 2017, 2:45 a.m. UTC | #7
Brian King <brking@linux.vnet.ibm.com> writes:

> On 02/12/2017 07:01 PM, Benjamin Herrenschmidt wrote:
>> On Mon, 2017-02-13 at 13:21 +1300, Eric W. Biederman wrote:
>>>> Good point, at the very least we should call remove if shutdown doesn't
>>>> exist. Eric: could we make the changes Ben suggests?
>>>
>>> Definitely.  That was the original design of the kexec interface
>>> but people were worried about calling remove during reboot might
>>> introduce regressions on the reboot functionality.  So shutdown was
>>> added to be remove without the cleaning up the kernel data structures.
>> 
>> Right. Part of the problem was also that remove was dependent on CONFIG_HOTPLUG
>> though that's no longer the case anymore.
>> 
>> The problem is that a bunch of drivers either don't have a shutdown or
>> worse, have one that actually shuts the HW down rather than "idle" it
>> which puts it in a state that the new kernel can't deal with.
>
> If we do transition to use remove rather than shutdown, I think we want
> some way for a device driver to know whether we are doing kexec or not.
> A RAID adapter with a write cache is going to want to flush its write
> cache on a PCI hotplug remove, but for a kexec, its going to want to skip
> that so the kexec is faster. Today, since kexec looks like a reboot,
> rather than a shutdown, we can skip the flush on a reboot, since its
> technically not needed there either.

This doesn't make any sense.

With a PCI hotplug remove you can't do anything because by the time you
get the event the hardware is gone.  I think there are optional
interlocks but nothing as good as the old 3.5" floppy disk button, or
the button on CD drives.

The only difference ever that should exist between shutdown and remove
is do you clean up kernel data structures.  The shutdown method is
allowed to skip the cleanup up kernel data structures that the remove
method needs to make.

Assuming the kernel does not have corrupted data structures I don't see
any practical reason for distinguishing between the two.  There may be
some real world gotchas we have to deal with but semantically I don't
see any concerns.

Eric
Benjamin Herrenschmidt Feb. 14, 2017, 3:39 a.m. UTC | #8
On Tue, 2017-02-14 at 15:45 +1300, Eric W. Biederman wrote:
> The only difference ever that should exist between shutdown and remove
> is do you clean up kernel data structures.  The shutdown method is
> allowed to skip the cleanup up kernel data structures that the remove
> method needs to make.
> 
> Assuming the kernel does not have corrupted data structures I don't see
> any practical reason for distinguishing between the two.  There may be
> some real world gotchas we have to deal with but semantically I don't
> see any concerns.

We had historical additions to shutdown actually shutting things down,
like spinning platters down to park disk heads, switching devices
into D3 on some systems, etc... that remove never had (and we don't
want for kexec).

Cheers,
Ben.
Brian King Feb. 14, 2017, 2:56 p.m. UTC | #9
On 02/13/2017 08:04 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2017-02-13 at 15:57 -0600, Brian King wrote:
>> If we do transition to use remove rather than shutdown, I think we
>> want
>> some way for a device driver to know whether we are doing kexec or
>> not.
>> A RAID adapter with a write cache is going to want to flush its write
>> cache on a PCI hotplug remove, but for a kexec, its going to want to
>> skip
>> that so the kexec is faster. Today, since kexec looks like a reboot,
>> rather than a shutdown, we can skip the flush on a reboot, since its
>> technically not needed there either.
> 
> What happens if a non-flushed adapter gets a PERST ?

It depends on the adapter, so it really needs to be a policy in the device
driver. For any adapter that has a non volatile write cache, data must preserved,
otherwise the write cache is not truly non volatile. For adapters with a volatile
write cache, then its subject to the adapter hardware. For ipr adapters
that have a volatile cache, they do guarantee the data is preserved across
a PERST.

> I wouldn't trust that 'don't have to flush' magic ...

I really don't think we want to force RAID adapters that have gigabytes of
*non volatile* write cache to flush their cache when we are merely performing
a kexec. This can take several minutes in the worst case scenarios.

-Brian
Eric W. Biederman Feb. 15, 2017, 2:44 a.m. UTC | #10
Brian King <brking@linux.vnet.ibm.com> writes:

> On 02/13/2017 08:04 PM, Benjamin Herrenschmidt wrote:
>> On Mon, 2017-02-13 at 15:57 -0600, Brian King wrote:
>>> If we do transition to use remove rather than shutdown, I think we
>>> want
>>> some way for a device driver to know whether we are doing kexec or
>>> not.
>>> A RAID adapter with a write cache is going to want to flush its write
>>> cache on a PCI hotplug remove, but for a kexec, its going to want to
>>> skip
>>> that so the kexec is faster. Today, since kexec looks like a reboot,
>>> rather than a shutdown, we can skip the flush on a reboot, since its
>>> technically not needed there either.
>> 
>> What happens if a non-flushed adapter gets a PERST ?
>
> It depends on the adapter, so it really needs to be a policy in the device
> driver. For any adapter that has a non volatile write cache, data must preserved,
> otherwise the write cache is not truly non volatile. For adapters with a volatile
> write cache, then its subject to the adapter hardware. For ipr adapters
> that have a volatile cache, they do guarantee the data is preserved across
> a PERST.
>
>> I wouldn't trust that 'don't have to flush' magic ...
>
> I really don't think we want to force RAID adapters that have gigabytes of
> *non volatile* write cache to flush their cache when we are merely performing
> a kexec. This can take several minutes in the worst case scenarios.


It is very very simple.  kexec does not guarantee what comes after it.

An optimization that skips flushing because the hardware write cache can
be trusted is fine.  An optimization that skips flushing because some
later kernel will take care of it is not.

I really don't see any room in there for distinguishing between which
kind of shutdown and/or kexec we are doing.  Either the hardware handles
it or it doesn't.

If the hardware doesn't handle everything than we need to handle it in
software.

Eric
Mauricio Faria de Oliveira March 6, 2017, 2:52 p.m. UTC | #11
On 02/12/2017 07:49 PM, Anton Blanchard wrote:
> We see lpfc devices regularly fail during kexec. Fix this by adding
> a shutdown method which mirrors the remove method.

Reviewed-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Tested-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>

@mkp, @jejb: may you please flag this patch for stable?  Thank you.

This patch fixes a problem of some adapter ports going offline after
reboot on the following adapter, at least:

     [10df:f100] Saturn-X: LightPulse Fibre Channel Host Adapter

     # dmesg | grep lpfc
     ...
     [   61.676728] lpfc 0003:12:00.0: 6:0442 Adapter failed to init,
     mbxCmd x88 CONFIG_PORT, mbxStatus x0 Data: x0
     [   92.766717] lpfc 0003:12:00.0: 6:0442 Adapter failed to init,
     mbxCmd x88 CONFIG_PORT, mbxStatus x0 Data: x0
     [  123.856717] lpfc 0003:12:00.0: 6:0442 Adapter failed to init,
     mbxCmd x88 CONFIG_PORT, mbxStatus x0 Data: x0
     ...
Martin K. Petersen March 7, 2017, 3:46 a.m. UTC | #12
>>>>> "Mauricio" == Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> writes:

Mauricio> On 02/12/2017 07:49 PM, Anton Blanchard wrote:
>> We see lpfc devices regularly fail during kexec. Fix this by adding a
>> shutdown method which mirrors the remove method.

Mauricio> @mkp, @jejb: may you please flag this patch for stable?  Thank
Mauricio> you.

I don't recall a consensus being reached on this patch.
Benjamin Herrenschmidt March 7, 2017, 5:24 a.m. UTC | #13
On Mon, 2017-03-06 at 22:46 -0500, Martin K. Petersen wrote:
> > > > > > "Mauricio" == Mauricio Faria de Oliveira <mauricfo@linux.vn
> > > > > > et.ibm.com> writes:
> 
> Mauricio> On 02/12/2017 07:49 PM, Anton Blanchard wrote:
> > > We see lpfc devices regularly fail during kexec. Fix this by
> > > adding a
> > > shutdown method which mirrors the remove method.
> 
> Mauricio> @mkp, @jejb: may you please flag this patch for
> stable?  Thank
> Mauricio> you.
> 
> I don't recall a consensus being reached on this patch.

What would be the opposition ? Without it kexec breaks. With it, it
works ...

Now we all seem to agree that kexec should be overhauled to not use
shutdown but instead unbind drivers, but that's a more long term
project that nobody yet had a chance to tackle.

In the meantime, some systems need a functioning kexec to boot.

Cheers,
Ben.
Mauricio Faria de Oliveira March 7, 2017, 12:51 p.m. UTC | #14
Martin,

On 03/07/2017 02:24 AM, Benjamin Herrenschmidt wrote:

> On Mon, 2017-03-06 at 22:46 -0500, Martin K. Petersen wrote:
>> I don't recall a consensus being reached on this patch.

> What would be the opposition ? Without it kexec breaks. With it, it
> works ...

That is the argument I'd present/ask for consideration too.

I think I should have included this in the tested-by tag email, for
documentation/evidence: no regression observed in system shutdown path.

Thanks,
Martin K. Petersen March 8, 2017, 1:29 a.m. UTC | #15
>>>>> "Mauricio" == Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> writes:

Mauricio> I think I should have included this in the tested-by tag
Mauricio> email, for documentation/evidence: no regression observed in
Mauricio> system shutdown path.

Applied to 4.11/scsi-fixes.
diff mbox

Patch

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 4776fd8..10f75ad 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -11447,6 +11447,7 @@  static struct pci_driver lpfc_driver = {
 	.id_table	= lpfc_id_table,
 	.probe		= lpfc_pci_probe_one,
 	.remove		= lpfc_pci_remove_one,
+	.shutdown	= lpfc_pci_remove_one,
 	.suspend        = lpfc_pci_suspend_one,
 	.resume		= lpfc_pci_resume_one,
 	.err_handler    = &lpfc_err_handler,