Message ID | 20170212214920.28866-1-anton@ozlabs.org (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
On Mon, 2017-02-13 at 08:49 +1100, Anton Blanchard wrote: > From: Anton Blanchard <anton@samba.org> > > We see lpfc devices regularly fail during kexec. Fix this by adding > a shutdown method which mirrors the remove method. Or instead finally do what I've been advocating for years (and even sent patches for) which is to have kexec call remove instead of shutdown. Shutdown is and has *always* been the wrong thing to do. > Signed-off-by: Anton Blanchard <anton@samba.org> > --- > drivers/scsi/lpfc/lpfc_init.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/scsi/lpfc/lpfc_init.c > b/drivers/scsi/lpfc/lpfc_init.c > index 4776fd8..10f75ad 100644 > --- a/drivers/scsi/lpfc/lpfc_init.c > +++ b/drivers/scsi/lpfc/lpfc_init.c > @@ -11447,6 +11447,7 @@ static struct pci_driver lpfc_driver = { > .id_table = lpfc_id_table, > .probe = lpfc_pci_probe_one, > .remove = lpfc_pci_remove_one, > + .shutdown = lpfc_pci_remove_one, > .suspend = lpfc_pci_suspend_one, > .resume = lpfc_pci_resume_one, > .err_handler = &lpfc_err_handler,
Hi Ben, On Mon, 13 Feb 2017 10:14:54 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Mon, 2017-02-13 at 08:49 +1100, Anton Blanchard wrote: > > From: Anton Blanchard <anton@samba.org> > > > > We see lpfc devices regularly fail during kexec. Fix this by adding > > a shutdown method which mirrors the remove method. > > Or instead finally do what I've been advocating for years (and even > sent patches for) which is to have kexec call remove instead of > shutdown. > > Shutdown is and has *always* been the wrong thing to do. Good point, at the very least we should call remove if shutdown doesn't exist. Eric: could we make the changes Ben suggests? Anton > > Signed-off-by: Anton Blanchard <anton@samba.org> > > --- > > drivers/scsi/lpfc/lpfc_init.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/scsi/lpfc/lpfc_init.c > > b/drivers/scsi/lpfc/lpfc_init.c > > index 4776fd8..10f75ad 100644 > > --- a/drivers/scsi/lpfc/lpfc_init.c > > +++ b/drivers/scsi/lpfc/lpfc_init.c > > @@ -11447,6 +11447,7 @@ static struct pci_driver lpfc_driver = { > > .id_table = lpfc_id_table, > > .probe = lpfc_pci_probe_one, > > .remove = lpfc_pci_remove_one, > > + .shutdown = lpfc_pci_remove_one, > > .suspend = lpfc_pci_suspend_one, > > .resume = lpfc_pci_resume_one, > > .err_handler = &lpfc_err_handler, >
Anton Blanchard <anton@samba.org> writes: > Hi Ben, > > On Mon, 13 Feb 2017 10:14:54 +1100 > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > >> On Mon, 2017-02-13 at 08:49 +1100, Anton Blanchard wrote: >> > From: Anton Blanchard <anton@samba.org> >> > >> > We see lpfc devices regularly fail during kexec. Fix this by adding >> > a shutdown method which mirrors the remove method. >> >> Or instead finally do what I've been advocating for years (and even >> sent patches for) which is to have kexec call remove instead of >> shutdown. Ben what happened to those patches. I don't remember them? >> Shutdown is and has *always* been the wrong thing to do. I won't argue that. > Good point, at the very least we should call remove if shutdown doesn't > exist. Eric: could we make the changes Ben suggests? Definitely. That was the original design of the kexec interface but people were worried about calling remove during reboot might introduce regressions on the reboot functionality. So shutdown was added to be remove without the cleaning up the kernel data structures. I am all for changing the core to call remove. That seems to be a more tested code path (because remove tends to be part of the development path for modules) and thus much more likely to work in practice. The challenge with changes in this area is that when the infrastructure is changed for everyone someone needs to baby sit it until all of the unexpected issues are resolved. I was hoping a couple of years ago that Ben could be that person. Eric
On Mon, 2017-02-13 at 13:21 +1300, Eric W. Biederman wrote: > > Good point, at the very least we should call remove if shutdown doesn't > > exist. Eric: could we make the changes Ben suggests? > > Definitely. That was the original design of the kexec interface > but people were worried about calling remove during reboot might > introduce regressions on the reboot functionality. So shutdown was > added to be remove without the cleaning up the kernel data structures. Right. Part of the problem was also that remove was dependent on CONFIG_HOTPLUG though that's no longer the case anymore. The problem is that a bunch of drivers either don't have a shutdown or worse, have one that actually shuts the HW down rather than "idle" it which puts it in a state that the new kernel can't deal with. > I am all for changing the core to call remove. That seems to be a more > tested code path (because remove tends to be part of the development > path for modules) and thus much more likely to work in practice. > > The challenge with changes in this area is that when the infrastructure > is changed for everyone someone needs to baby sit it until all of the > unexpected issues are resolved. I was hoping a couple of years ago that > Ben could be that person. Correct. And I lack time. But since we are a platform that uses kexec daily as our main booting mechanism we should probably be the ones driving that. I had a patch at least to fallback to remove in absence of shutdown which I can try to dig out. We can add a config option to make it do the other way around that we should start testing internally. I'm sure we will find 1 or 2 regressions as drivers do weird things. Cheers, Ben.
On 02/12/2017 07:01 PM, Benjamin Herrenschmidt wrote: > On Mon, 2017-02-13 at 13:21 +1300, Eric W. Biederman wrote: >>> Good point, at the very least we should call remove if shutdown doesn't >>> exist. Eric: could we make the changes Ben suggests? >> >> Definitely. That was the original design of the kexec interface >> but people were worried about calling remove during reboot might >> introduce regressions on the reboot functionality. So shutdown was >> added to be remove without the cleaning up the kernel data structures. > > Right. Part of the problem was also that remove was dependent on CONFIG_HOTPLUG > though that's no longer the case anymore. > > The problem is that a bunch of drivers either don't have a shutdown or > worse, have one that actually shuts the HW down rather than "idle" it > which puts it in a state that the new kernel can't deal with. If we do transition to use remove rather than shutdown, I think we want some way for a device driver to know whether we are doing kexec or not. A RAID adapter with a write cache is going to want to flush its write cache on a PCI hotplug remove, but for a kexec, its going to want to skip that so the kexec is faster. Today, since kexec looks like a reboot, rather than a shutdown, we can skip the flush on a reboot, since its technically not needed there either. -Brian
On Mon, 2017-02-13 at 15:57 -0600, Brian King wrote: > If we do transition to use remove rather than shutdown, I think we > want > some way for a device driver to know whether we are doing kexec or > not. > A RAID adapter with a write cache is going to want to flush its write > cache on a PCI hotplug remove, but for a kexec, its going to want to > skip > that so the kexec is faster. Today, since kexec looks like a reboot, > rather than a shutdown, we can skip the flush on a reboot, since its > technically not needed there either. What happens if a non-flushed adapter gets a PERST ? I wouldn't trust that 'don't have to flush' magic ... Cheers, Ben.
Brian King <brking@linux.vnet.ibm.com> writes: > On 02/12/2017 07:01 PM, Benjamin Herrenschmidt wrote: >> On Mon, 2017-02-13 at 13:21 +1300, Eric W. Biederman wrote: >>>> Good point, at the very least we should call remove if shutdown doesn't >>>> exist. Eric: could we make the changes Ben suggests? >>> >>> Definitely. That was the original design of the kexec interface >>> but people were worried about calling remove during reboot might >>> introduce regressions on the reboot functionality. So shutdown was >>> added to be remove without the cleaning up the kernel data structures. >> >> Right. Part of the problem was also that remove was dependent on CONFIG_HOTPLUG >> though that's no longer the case anymore. >> >> The problem is that a bunch of drivers either don't have a shutdown or >> worse, have one that actually shuts the HW down rather than "idle" it >> which puts it in a state that the new kernel can't deal with. > > If we do transition to use remove rather than shutdown, I think we want > some way for a device driver to know whether we are doing kexec or not. > A RAID adapter with a write cache is going to want to flush its write > cache on a PCI hotplug remove, but for a kexec, its going to want to skip > that so the kexec is faster. Today, since kexec looks like a reboot, > rather than a shutdown, we can skip the flush on a reboot, since its > technically not needed there either. This doesn't make any sense. With a PCI hotplug remove you can't do anything because by the time you get the event the hardware is gone. I think there are optional interlocks but nothing as good as the old 3.5" floppy disk button, or the button on CD drives. The only difference ever that should exist between shutdown and remove is do you clean up kernel data structures. The shutdown method is allowed to skip the cleanup up kernel data structures that the remove method needs to make. Assuming the kernel does not have corrupted data structures I don't see any practical reason for distinguishing between the two. There may be some real world gotchas we have to deal with but semantically I don't see any concerns. Eric
On Tue, 2017-02-14 at 15:45 +1300, Eric W. Biederman wrote: > The only difference ever that should exist between shutdown and remove > is do you clean up kernel data structures. The shutdown method is > allowed to skip the cleanup up kernel data structures that the remove > method needs to make. > > Assuming the kernel does not have corrupted data structures I don't see > any practical reason for distinguishing between the two. There may be > some real world gotchas we have to deal with but semantically I don't > see any concerns. We had historical additions to shutdown actually shutting things down, like spinning platters down to park disk heads, switching devices into D3 on some systems, etc... that remove never had (and we don't want for kexec). Cheers, Ben.
On 02/13/2017 08:04 PM, Benjamin Herrenschmidt wrote: > On Mon, 2017-02-13 at 15:57 -0600, Brian King wrote: >> If we do transition to use remove rather than shutdown, I think we >> want >> some way for a device driver to know whether we are doing kexec or >> not. >> A RAID adapter with a write cache is going to want to flush its write >> cache on a PCI hotplug remove, but for a kexec, its going to want to >> skip >> that so the kexec is faster. Today, since kexec looks like a reboot, >> rather than a shutdown, we can skip the flush on a reboot, since its >> technically not needed there either. > > What happens if a non-flushed adapter gets a PERST ? It depends on the adapter, so it really needs to be a policy in the device driver. For any adapter that has a non volatile write cache, data must preserved, otherwise the write cache is not truly non volatile. For adapters with a volatile write cache, then its subject to the adapter hardware. For ipr adapters that have a volatile cache, they do guarantee the data is preserved across a PERST. > I wouldn't trust that 'don't have to flush' magic ... I really don't think we want to force RAID adapters that have gigabytes of *non volatile* write cache to flush their cache when we are merely performing a kexec. This can take several minutes in the worst case scenarios. -Brian
Brian King <brking@linux.vnet.ibm.com> writes: > On 02/13/2017 08:04 PM, Benjamin Herrenschmidt wrote: >> On Mon, 2017-02-13 at 15:57 -0600, Brian King wrote: >>> If we do transition to use remove rather than shutdown, I think we >>> want >>> some way for a device driver to know whether we are doing kexec or >>> not. >>> A RAID adapter with a write cache is going to want to flush its write >>> cache on a PCI hotplug remove, but for a kexec, its going to want to >>> skip >>> that so the kexec is faster. Today, since kexec looks like a reboot, >>> rather than a shutdown, we can skip the flush on a reboot, since its >>> technically not needed there either. >> >> What happens if a non-flushed adapter gets a PERST ? > > It depends on the adapter, so it really needs to be a policy in the device > driver. For any adapter that has a non volatile write cache, data must preserved, > otherwise the write cache is not truly non volatile. For adapters with a volatile > write cache, then its subject to the adapter hardware. For ipr adapters > that have a volatile cache, they do guarantee the data is preserved across > a PERST. > >> I wouldn't trust that 'don't have to flush' magic ... > > I really don't think we want to force RAID adapters that have gigabytes of > *non volatile* write cache to flush their cache when we are merely performing > a kexec. This can take several minutes in the worst case scenarios. It is very very simple. kexec does not guarantee what comes after it. An optimization that skips flushing because the hardware write cache can be trusted is fine. An optimization that skips flushing because some later kernel will take care of it is not. I really don't see any room in there for distinguishing between which kind of shutdown and/or kexec we are doing. Either the hardware handles it or it doesn't. If the hardware doesn't handle everything than we need to handle it in software. Eric
On 02/12/2017 07:49 PM, Anton Blanchard wrote: > We see lpfc devices regularly fail during kexec. Fix this by adding > a shutdown method which mirrors the remove method. Reviewed-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Tested-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> @mkp, @jejb: may you please flag this patch for stable? Thank you. This patch fixes a problem of some adapter ports going offline after reboot on the following adapter, at least: [10df:f100] Saturn-X: LightPulse Fibre Channel Host Adapter # dmesg | grep lpfc ... [ 61.676728] lpfc 0003:12:00.0: 6:0442 Adapter failed to init, mbxCmd x88 CONFIG_PORT, mbxStatus x0 Data: x0 [ 92.766717] lpfc 0003:12:00.0: 6:0442 Adapter failed to init, mbxCmd x88 CONFIG_PORT, mbxStatus x0 Data: x0 [ 123.856717] lpfc 0003:12:00.0: 6:0442 Adapter failed to init, mbxCmd x88 CONFIG_PORT, mbxStatus x0 Data: x0 ...
>>>>> "Mauricio" == Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> writes: Mauricio> On 02/12/2017 07:49 PM, Anton Blanchard wrote: >> We see lpfc devices regularly fail during kexec. Fix this by adding a >> shutdown method which mirrors the remove method. Mauricio> @mkp, @jejb: may you please flag this patch for stable? Thank Mauricio> you. I don't recall a consensus being reached on this patch.
On Mon, 2017-03-06 at 22:46 -0500, Martin K. Petersen wrote: > > > > > > "Mauricio" == Mauricio Faria de Oliveira <mauricfo@linux.vn > > > > > > et.ibm.com> writes: > > Mauricio> On 02/12/2017 07:49 PM, Anton Blanchard wrote: > > > We see lpfc devices regularly fail during kexec. Fix this by > > > adding a > > > shutdown method which mirrors the remove method. > > Mauricio> @mkp, @jejb: may you please flag this patch for > stable? Thank > Mauricio> you. > > I don't recall a consensus being reached on this patch. What would be the opposition ? Without it kexec breaks. With it, it works ... Now we all seem to agree that kexec should be overhauled to not use shutdown but instead unbind drivers, but that's a more long term project that nobody yet had a chance to tackle. In the meantime, some systems need a functioning kexec to boot. Cheers, Ben.
Martin, On 03/07/2017 02:24 AM, Benjamin Herrenschmidt wrote: > On Mon, 2017-03-06 at 22:46 -0500, Martin K. Petersen wrote: >> I don't recall a consensus being reached on this patch. > What would be the opposition ? Without it kexec breaks. With it, it > works ... That is the argument I'd present/ask for consideration too. I think I should have included this in the tested-by tag email, for documentation/evidence: no regression observed in system shutdown path. Thanks,
>>>>> "Mauricio" == Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> writes:
Mauricio> I think I should have included this in the tested-by tag
Mauricio> email, for documentation/evidence: no regression observed in
Mauricio> system shutdown path.
Applied to 4.11/scsi-fixes.
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 4776fd8..10f75ad 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -11447,6 +11447,7 @@ static struct pci_driver lpfc_driver = { .id_table = lpfc_id_table, .probe = lpfc_pci_probe_one, .remove = lpfc_pci_remove_one, + .shutdown = lpfc_pci_remove_one, .suspend = lpfc_pci_suspend_one, .resume = lpfc_pci_resume_one, .err_handler = &lpfc_err_handler,