Message ID | 1385579908-24608-1-git-send-email-khalid.aziz@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Khalid Aziz <khalid.aziz@oracle.com> writes: > Add a flag to tell the PCI subsystem that kernel is shutting down > in prepapration to kexec a kernel. Add code in PCI subsystem to use > this flag to clear Bus Master bit on PCI devices only in case of > kexec reboot. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=63861 > and avoids any other issues caused by clearing Bus Master bit on PCI > devices in normal shutdown path. This patch is based on discussion at > http://marc.info/?l=linux-pci&m=138425645204355&w=2 Scratches head. Given that most devices already call pci_disable_device which clears the bus master bit how does this change anything meaningful? Is is the problem here that most drivers are lazy and have a noop shutdown method? Eric > Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> > Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> > Cc: stable@vger.kernel.org > --- > drivers/pci/pci-driver.c | 9 ++++++--- > drivers/pci/pci.h | 3 +++ > kernel/kexec.c | 4 ++++ > 3 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 9042fdb..e920195 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -400,10 +400,13 @@ static void pci_device_shutdown(struct device *dev) > pci_msix_shutdown(pci_dev); > > /* > - * Turn off Bus Master bit on the device to tell it to not > - * continue to do DMA. Don't touch devices in D3cold or unknown states. > + * If this is a kexec reboot, turn off Bus Master bit on the > + * device to tell it to not continue to do DMA. Don't touch > + * devices in D3cold or unknown states. > + * If it is not a kexec reboot, firmware will hit the PCI > + * devices with big hammer and stop their DMA any way. > */ > - if (pci_dev->current_state <= PCI_D3hot) > + if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) > pci_clear_master(pci_dev); > } > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 9c91ecc..7d85733 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -9,6 +9,9 @@ > extern const unsigned char pcix_bus_speed[]; > extern const unsigned char pcie_link_speed[]; > > +/* flag to track if kexec reboot is in progress */ > +extern unsigned long kexec_in_progress; > + > /* Functions internal to the PCI core code */ > > int pci_create_sysfs_dev_files(struct pci_dev *pdev); > diff --git a/kernel/kexec.c b/kernel/kexec.c > index 490afc0..fd2d63e 100644 > --- a/kernel/kexec.c > +++ b/kernel/kexec.c > @@ -47,6 +47,9 @@ u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; > size_t vmcoreinfo_size; > size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data); > > +/* Flag to indicate we are going to kexec a new kernel */ > +unsigned long kexec_in_progress = 0; > + > /* Location of the reserved area for the crash kernel */ > struct resource crashk_res = { > .name = "Crash kernel", > @@ -1675,6 +1678,7 @@ int kernel_kexec(void) > } else > #endif > { > + kexec_in_progress = 1; > kernel_restart_prepare(NULL); > printk(KERN_EMERG "Starting new kernel\n"); > machine_shutdown(); -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 27, 2013 at 12:18:28PM -0700, Khalid Aziz wrote: > +/* flag to track if kexec reboot is in progress */ > +extern unsigned long kexec_in_progress; "unsigned long" for a "flag"? -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/27/2013 12:24 PM, Matthew Garrett wrote: > On Wed, Nov 27, 2013 at 12:18:28PM -0700, Khalid Aziz wrote: > >> +/* flag to track if kexec reboot is in progress */ >> +extern unsigned long kexec_in_progress; > > Adding this to pci.h seems a little odd. We may want to use it somewhere > else at some point. Add it to kexec.h instead? > I debated between pci.h and kexec.h but pci-driver.c does not include kexec.h and I didn't want to include a whole new file. Now I see another problem with adding that extern declaration to pci.h - if CONFIG_KEXEC is not set, build will fail. I should add #ifdef CONFIG_KEXEC to the code in pci-driver.c as well. Time for v2. -- Khalid -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 27, 2013 at 12:48:42PM -0700, Khalid Aziz wrote: > I debated between pci.h and kexec.h but pci-driver.c does not > include kexec.h and I didn't want to include a whole new file. Now I > see another problem with adding that extern declaration to pci.h - > if CONFIG_KEXEC is not set, build will fail. I should add #ifdef > CONFIG_KEXEC to the code in pci-driver.c as well. Time for v2. You're making the behaviour of the pci code conditional on whether we're in kexec or not, so I think adding kexec.h is perfectly reasonable.
On 11/27/2013 12:38 PM, ebiederm@xmission.com wrote: > Khalid Aziz <khalid.aziz@oracle.com> writes: > >> Add a flag to tell the PCI subsystem that kernel is shutting down >> in prepapration to kexec a kernel. Add code in PCI subsystem to use >> this flag to clear Bus Master bit on PCI devices only in case of >> kexec reboot. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=63861 >> and avoids any other issues caused by clearing Bus Master bit on PCI >> devices in normal shutdown path. This patch is based on discussion at >> http://marc.info/?l=linux-pci&m=138425645204355&w=2 > > Scratches head. > > Given that most devices already call pci_disable_device which clears the > bus master bit how does this change anything meaningful? > > Is is the problem here that most drivers are lazy and have a noop > shutdown method? Yes, that is exactly the problem. -- Khalid > > Eric > > >> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> >> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> >> Cc: stable@vger.kernel.org >> --- >> drivers/pci/pci-driver.c | 9 ++++++--- >> drivers/pci/pci.h | 3 +++ >> kernel/kexec.c | 4 ++++ >> 3 files changed, 13 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c >> index 9042fdb..e920195 100644 >> --- a/drivers/pci/pci-driver.c >> +++ b/drivers/pci/pci-driver.c >> @@ -400,10 +400,13 @@ static void pci_device_shutdown(struct device *dev) >> pci_msix_shutdown(pci_dev); >> >> /* >> - * Turn off Bus Master bit on the device to tell it to not >> - * continue to do DMA. Don't touch devices in D3cold or unknown states. >> + * If this is a kexec reboot, turn off Bus Master bit on the >> + * device to tell it to not continue to do DMA. Don't touch >> + * devices in D3cold or unknown states. >> + * If it is not a kexec reboot, firmware will hit the PCI >> + * devices with big hammer and stop their DMA any way. >> */ >> - if (pci_dev->current_state <= PCI_D3hot) >> + if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) >> pci_clear_master(pci_dev); >> } >> >> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h >> index 9c91ecc..7d85733 100644 >> --- a/drivers/pci/pci.h >> +++ b/drivers/pci/pci.h >> @@ -9,6 +9,9 @@ >> extern const unsigned char pcix_bus_speed[]; >> extern const unsigned char pcie_link_speed[]; >> >> +/* flag to track if kexec reboot is in progress */ >> +extern unsigned long kexec_in_progress; >> + >> /* Functions internal to the PCI core code */ >> >> int pci_create_sysfs_dev_files(struct pci_dev *pdev); >> diff --git a/kernel/kexec.c b/kernel/kexec.c >> index 490afc0..fd2d63e 100644 >> --- a/kernel/kexec.c >> +++ b/kernel/kexec.c >> @@ -47,6 +47,9 @@ u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; >> size_t vmcoreinfo_size; >> size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data); >> >> +/* Flag to indicate we are going to kexec a new kernel */ >> +unsigned long kexec_in_progress = 0; >> + >> /* Location of the reserved area for the crash kernel */ >> struct resource crashk_res = { >> .name = "Crash kernel", >> @@ -1675,6 +1678,7 @@ int kernel_kexec(void) >> } else >> #endif >> { >> + kexec_in_progress = 1; >> kernel_restart_prepare(NULL); >> printk(KERN_EMERG "Starting new kernel\n"); >> machine_shutdown(); -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 27, 2013 at 12:59:40PM -0700, Khalid Aziz wrote: > On 11/27/2013 12:38 PM, ebiederm@xmission.com wrote: > >Khalid Aziz <khalid.aziz@oracle.com> writes: > > > >>Add a flag to tell the PCI subsystem that kernel is shutting down > >>in prepapration to kexec a kernel. Add code in PCI subsystem to use > >>this flag to clear Bus Master bit on PCI devices only in case of > >>kexec reboot. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=63861 > >>and avoids any other issues caused by clearing Bus Master bit on PCI > >>devices in normal shutdown path. This patch is based on discussion at > >>http://marc.info/?l=linux-pci&m=138425645204355&w=2 > > > >Scratches head. > > > >Given that most devices already call pci_disable_device which clears the > >bus master bit how does this change anything meaningful? > > > >Is is the problem here that most drivers are lazy and have a noop > >shutdown method? > > Yes, that is exactly the problem. Then fix the drivers please. It's not as if you don't have access to the source for them all... greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 27, 2013 at 01:22:27PM -0800, Greg KH wrote: > Then fix the drivers please. It's not as if you don't have access to > the source for them all... Define "fix". It's clearly wrong to disable busmastering at shutdown on some devices, otherwise we wouldn't be having this discussion at all.
On Wed, Nov 27, 2013 at 09:53:09PM +0000, Matthew Garrett wrote: > On Wed, Nov 27, 2013 at 01:22:27PM -0800, Greg KH wrote: > > > Then fix the drivers please. It's not as if you don't have access to > > the source for them all... > > Define "fix". It's clearly wrong to disable busmastering at shutdown on > some devices, otherwise we wouldn't be having this discussion at all. I thought it was only "wrong" to disable this on multi-function devices, which is why some drivers didn't do it. Otherwise, how would it be any different to have the global setting? Anyway, I really don't care either way, but this seems like something that the drivers should be doing. What suddenly changed that caused this problem to occur that hasn't happened in the years prior to now that drives this to be a stable-kernel issue? greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 27, 2013 at 02:01:06PM -0800, Greg KH wrote: > On Wed, Nov 27, 2013 at 09:53:09PM +0000, Matthew Garrett wrote: > > On Wed, Nov 27, 2013 at 01:22:27PM -0800, Greg KH wrote: > > > > > Then fix the drivers please. It's not as if you don't have access to > > > the source for them all... > > > > Define "fix". It's clearly wrong to disable busmastering at shutdown on > > some devices, otherwise we wouldn't be having this discussion at all. > > I thought it was only "wrong" to disable this on multi-function devices, > which is why some drivers didn't do it. Otherwise, how would it be any > different to have the global setting? kexec doesn't jump into the firmware, so firmware assumptions about the state of the busmaster bit don't matter. > Anyway, I really don't care either way, but this seems like something > that the drivers should be doing. What suddenly changed that caused > this problem to occur that hasn't happened in the years prior to now > that drives this to be a stable-kernel issue? We started clearing the busmaster bit on all devices on shutdown in 3.something in order to ensure that DMA wasn't occuring while we were in the process of performing a kexec. Some machines freeze on shutdown as a result. This patch reverts back to the original behaviour on real shutdown, while still avoiding the "This PCI device scribbled over my new kernel" kexec case.
On 11/27/2013 03:07 PM, Matthew Garrett wrote: > On Wed, Nov 27, 2013 at 02:01:06PM -0800, Greg KH wrote: >> Anyway, I really don't care either way, but this seems like something >> that the drivers should be doing. What suddenly changed that caused >> this problem to occur that hasn't happened in the years prior to now >> that drives this to be a stable-kernel issue? > > We started clearing the busmaster bit on all devices on shutdown in > 3.something in order to ensure that DMA wasn't occuring while we were > in the process of performing a kexec. Some machines freeze on shutdown > as a result. This patch reverts back to the original behaviour on real > shutdown, while still avoiding the "This PCI device scribbled over my > new kernel" kexec case. > Thanks for explaining this, Matthew. That was my reasoning exactly for why this patch should apply to stable. It fixes a real problem some users are experiencing. Commit log contains the URL to bugzilla entry for the problem. -- Khalid -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Anyway, I really don't care either way, but this seems like something > that the drivers should be doing. What suddenly changed that caused > this problem to occur that hasn't happened in the years prior to now > that drives this to be a stable-kernel issue? When this first went in I pointed out it was an utterly stupid idea. Since it went in lots of machines haven't rebooted properly or powered off right. There are two problems 1. Clearing the busmaster bit is not well defined behaviour. It even freezes some hardware. 2. Lots of PC class hardware has firmware which believes that it can access the hardware as it goes to reboot or poweroff and that someone won't have shut it down. Like it or not the firmware expected behaviour for such things is "what Windows did". The expected PC behaviour is subtle and magic - eg the fact D3 on some IDE disk controllers is terminal until power cycled because Windows didn't do that or that the BIOS goes off and chats to the disks in reboot without assuming the disk controller is completely uninitialized. kexec is a special cornercase and handling is as such (knowing it will bother to re-init appropriate devices) is very different to the current broken behaviour for PC systems. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 9042fdb..e920195 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -400,10 +400,13 @@ static void pci_device_shutdown(struct device *dev) pci_msix_shutdown(pci_dev); /* - * Turn off Bus Master bit on the device to tell it to not - * continue to do DMA. Don't touch devices in D3cold or unknown states. + * If this is a kexec reboot, turn off Bus Master bit on the + * device to tell it to not continue to do DMA. Don't touch + * devices in D3cold or unknown states. + * If it is not a kexec reboot, firmware will hit the PCI + * devices with big hammer and stop their DMA any way. */ - if (pci_dev->current_state <= PCI_D3hot) + if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) pci_clear_master(pci_dev); } diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 9c91ecc..7d85733 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -9,6 +9,9 @@ extern const unsigned char pcix_bus_speed[]; extern const unsigned char pcie_link_speed[]; +/* flag to track if kexec reboot is in progress */ +extern unsigned long kexec_in_progress; + /* Functions internal to the PCI core code */ int pci_create_sysfs_dev_files(struct pci_dev *pdev); diff --git a/kernel/kexec.c b/kernel/kexec.c index 490afc0..fd2d63e 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -47,6 +47,9 @@ u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; size_t vmcoreinfo_size; size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data); +/* Flag to indicate we are going to kexec a new kernel */ +unsigned long kexec_in_progress = 0; + /* Location of the reserved area for the crash kernel */ struct resource crashk_res = { .name = "Crash kernel", @@ -1675,6 +1678,7 @@ int kernel_kexec(void) } else #endif { + kexec_in_progress = 1; kernel_restart_prepare(NULL); printk(KERN_EMERG "Starting new kernel\n"); machine_shutdown();