Message ID | 1438133951.1856.23.camel@rzhang1-mobl4 (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Hi Zhang, On Wed, Jul 29, 2015 at 09:39:11AM +0800, Zhang Rui wrote: > From c6b063b2874b62466362a10b53097216b16e400d Mon Sep 17 00:00:00 2001 > From: Zhang Rui <rui.zhang@intel.com> > Date: Sun, 26 Jul 2015 14:15:36 +0800 > Subject: [PATCH] PCI: Disable async suspend/resume for Jmicron chip > > In https://bugzilla.kernel.org/show_bug.cgi?id=81551, > we found that Jmicron chip 361/363 is broken after resume if async noirq > (76569faa62 (PM / sleep: Asynchronous threads for resume_noirq)) is supported, > thus commit e6b7e41cdd (ata: Disabling the async PM for JMicron chip 363/361) > is introduced to fix this problem. > But then, we found that Jmicron chip 368 also has this problem, and it is decided > to disable the pm async feature for all the Jmicron chips. > > But how to fix this was discussed in the mailing list for some time. > After some investigation, we believe that a proper fix is to disable > the async PM in PCI instead of ata driver, because, on this platform, > pci_resume_noirq() of IDE controller must happen after pci_resume_noirq() > of AHCI controller. But as .resume_noirq() of the pata_jmicron driver is > no-op, this suggests that it is the PCI common actions, aka, > pci_pm_default_resume_early(), have the dependency. > To fix this, using device_pm_wait_for_dev() in pata_jmicron driver can not > solve the dependency because pci_pm_default_resume_early() is invoked before > driver callback being invoked, plus, as it is the PCI common actions that > have the dependency, it is reasonable to fix it in PCI bus code, > rather than driver code. > > This patch is made based on the patch from Liu Chuansheng at > https://lkml.org/lkml/2014/12/5/74 > it reverts commit e6b7e41cdd ("ata: Disabling the async PM for JMicron > chip 363/361"), and introduces a PCI quirk to disable async PM for Jmicron > chips. This changelog has a lot of text, but doesn't tell me what I really want to know. We need to know what the problem is from the *device* point of view, not in terms of kernel function names. Function names are only meaningful to a handful of experts, but a concrete problem description may be useful to hardware designers and firmware writers who can potentially fix the root issue. In this case, I think the problem is something like: "on these multi-function JMicron devices, the PATA controller at function 1 doesn't work if it is powered on before the SATA controller at function 0." It's also helpful to include a symptom to help people connect a problem with the solution. For example, the 81551 bug reports says these are symptoms: pata_jmicron 0000:02:00.1: Refused to change power state, currently in D3 irq 17: nobody cared (try booting with the "irqpoll" option) This patch doesn't change the workaround: we still use device_disable_async_suspend() on these devices. This patch merely: 1) Changes the workaround so instead of doing this only for 361 and 363 devices, we turn off async suspend on *all* JMicron devices, and 2) Moves the workaround from the AHCI and PATA drivers to the PCI core. For 1), I think it is probably overkill to penalize all JMicron devices. There's no reason to think NICs and SD/MMC/etc controllers have the same issue. Or if there *is* reason to think that, you should give evidence for it. For 2), you have not made a convincing argument for why the workaround needs to be in the PCI core. Such an argument would be of the form "we need this quirk even if the driver hasn't been loaded yet because ..." Since you didn't say anything like that, I assume the patch in comment #72 of bug #81551 (https://bugzilla.kernel.org/attachment.cgi?id=156301) would work as well. That has the advantage that it wouldn't penalize non-storage controllers. Other minor nits: - The function "pci_resume_noirq()" does not exist. I assume you meant pci_pm_resume_noirq(). And as I mentioned earlier, I don't think the name is relevant in the changelog anyway. - You have a mix of SHA1 lengths and summary quoting above. Please use the conventional commit reference style (with 12-char SHA-1) consistently, e.g., 76569faa62c4 ("PM / sleep: Asynchronous threads for resume_noirq") - Please use http://lkml.kernel.org/r/... references when possible, not https://lkml.org/.... > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=81511 > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=81551 > Reviewed-by: Aaron Lu <aaron.lu@intel.com> > Acked-by: Chuansheng Liu <chuansheng.liu@intel.com> > Acked-by: Tejun Heo <tj@kernel.org> > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > --- > drivers/ata/ahci.c | 12 ------------ > drivers/ata/pata_jmicron.c | 12 ------------ > drivers/pci/quirks.c | 17 +++++++++++++++++ > 3 files changed, 17 insertions(+), 24 deletions(-) > > diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c > index 7e62751..26bb40d 100644 > --- a/drivers/ata/ahci.c > +++ b/drivers/ata/ahci.c > @@ -1451,18 +1451,6 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) > else if (pdev->vendor == 0x177d && pdev->device == 0xa01c) > ahci_pci_bar = AHCI_PCI_BAR_CAVIUM; > > - /* > - * The JMicron chip 361/363 contains one SATA controller and one > - * PATA controller,for powering on these both controllers, we must > - * follow the sequence one by one, otherwise one of them can not be > - * powered on successfully, so here we disable the async suspend > - * method for these chips. > - */ > - if (pdev->vendor == PCI_VENDOR_ID_JMICRON && > - (pdev->device == PCI_DEVICE_ID_JMICRON_JMB363 || > - pdev->device == PCI_DEVICE_ID_JMICRON_JMB361)) > - device_disable_async_suspend(&pdev->dev); > - > /* acquire resources */ > rc = pcim_enable_device(pdev); > if (rc) > diff --git a/drivers/ata/pata_jmicron.c b/drivers/ata/pata_jmicron.c > index 47e418b..4d1a5d2 100644 > --- a/drivers/ata/pata_jmicron.c > +++ b/drivers/ata/pata_jmicron.c > @@ -143,18 +143,6 @@ static int jmicron_init_one (struct pci_dev *pdev, const struct pci_device_id *i > }; > const struct ata_port_info *ppi[] = { &info, NULL }; > > - /* > - * The JMicron chip 361/363 contains one SATA controller and one > - * PATA controller,for powering on these both controllers, we must > - * follow the sequence one by one, otherwise one of them can not be > - * powered on successfully, so here we disable the async suspend > - * method for these chips. > - */ > - if (pdev->vendor == PCI_VENDOR_ID_JMICRON && > - (pdev->device == PCI_DEVICE_ID_JMICRON_JMB363 || > - pdev->device == PCI_DEVICE_ID_JMICRON_JMB361)) > - device_disable_async_suspend(&pdev->dev); > - > return ata_pci_bmdma_init_one(pdev, ppi, &jmicron_sht, NULL, 0); > } > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index e9fd0e9..02803f8 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -29,6 +29,23 @@ > #include "pci.h" > > /* > + * For JMicron chips, we need to disable the async_suspend method, otherwise > + * they will hit the power-on issue when doing device resume, add one quick > + * solution to disable the async_suspend method. > + * > + * https://bugzilla.kernel.org/show_bug.cgi?id=81551 > + */ > +static void pci_async_suspend_fixup(struct pci_dev *pdev) > +{ > + /* > + * disabling the async_suspend method for JMicron chips to > + * avoid device resuming issue. > + */ > + device_disable_async_suspend(&pdev->dev); > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, pci_async_suspend_fixup); > + > +/* > * Decoding should be disabled for a PCI device during BAR sizing to avoid > * conflict. But doing so may cause problems on host bridge and perhaps other > * key system devices. For devices that need to have mmio decoding always-on, > -- > 1.9.1 > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jay, On Sat, Aug 15, 2015 at 7:49 AM, Jay <MyMailClone@t-online.de> wrote: > Am Freitag, 14. August 2015, 15:17:52 schrieb Bjorn Helgaas: > ... >> Since you didn't say anything like that, I assume the patch in comment >> #72 of bug #81551 (https://bugzilla.kernel.org/attachment.cgi?id=156301) >> would work as well. That has the advantage that it wouldn't penalize >> non-storage controllers. > > The problem was introduced with 3.15 and the kernel is almost at 4.2 now. > > A general solution for the JMicron-AHCI/PATA-controllers is still missing > although available since late September 2014. It was tested by Barto. > > Seems what I wrote in comment #76 wasn't completely wrong: > https://bugzilla.kernel.org/show_bug.cgi?id=81551#c76 > > So this may be an opportunity to reconsider the approach to solving > things like this. I don't understand your point, except to acknowledge that we are imperfect, have limited resources, and make many mistakes in diagnosing problems and communicating solutions. Do you have a proposal for a better approach? The patch that Barto tested in late September 2014 (https://bugzilla.kernel.org/show_bug.cgi?id=81551#c66) is exactly the patch from comment #72 that I mentioned as possibly being a better solution. That patch wouldn't involve me at all, since it doesn't touch PCI. Zhang is proposing a PCI change, so I'm asking for a clear changelog. A changelog is a "write-once, read-many" situation. It's very important that it be concise and clear, and it's worth having the expert spend extra time writing the log to make it easier for the many novices that will read it in the future. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Bjorn, On Fri, 2015-08-14 at 15:17 -0500, Bjorn Helgaas wrote: > Hi Zhang, > > On Wed, Jul 29, 2015 at 09:39:11AM +0800, Zhang Rui wrote: > > From c6b063b2874b62466362a10b53097216b16e400d Mon Sep 17 00:00:00 2001 > > From: Zhang Rui <rui.zhang@intel.com> > > Date: Sun, 26 Jul 2015 14:15:36 +0800 > > Subject: [PATCH] PCI: Disable async suspend/resume for Jmicron chip > > > > In https://bugzilla.kernel.org/show_bug.cgi?id=81551, > > we found that Jmicron chip 361/363 is broken after resume if async noirq > > (76569faa62 (PM / sleep: Asynchronous threads for resume_noirq)) is supported, > > thus commit e6b7e41cdd (ata: Disabling the async PM for JMicron chip 363/361) > > is introduced to fix this problem. > > But then, we found that Jmicron chip 368 also has this problem, and it is decided > > to disable the pm async feature for all the Jmicron chips. > > > > But how to fix this was discussed in the mailing list for some time. > > After some investigation, we believe that a proper fix is to disable > > the async PM in PCI instead of ata driver, because, on this platform, > > pci_resume_noirq() of IDE controller must happen after pci_resume_noirq() > > of AHCI controller. But as .resume_noirq() of the pata_jmicron driver is > > no-op, this suggests that it is the PCI common actions, aka, > > pci_pm_default_resume_early(), have the dependency. > > To fix this, using device_pm_wait_for_dev() in pata_jmicron driver can not > > solve the dependency because pci_pm_default_resume_early() is invoked before > > driver callback being invoked, plus, as it is the PCI common actions that > > have the dependency, it is reasonable to fix it in PCI bus code, > > rather than driver code. > > > > This patch is made based on the patch from Liu Chuansheng at > > https://lkml.org/lkml/2014/12/5/74 > > it reverts commit e6b7e41cdd ("ata: Disabling the async PM for JMicron > > chip 363/361"), and introduces a PCI quirk to disable async PM for Jmicron > > chips. > > This changelog has a lot of text, but doesn't tell me what I really > want to know. > > We need to know what the problem is from the *device* point of view, > not in terms of kernel function names. Function names are only > meaningful to a handful of experts, but a concrete problem description > may be useful to hardware designers and firmware writers who can > potentially fix the root issue. > Well, I sent this patch just because there is a regression since 3.15, and we already have two patches that have been verified to fix the problem, but none of them are pushed for upstream. I don't have many PCI background, thus I chose to use function names to make myself precise enough but apparently I failed :p Maybe I should send this patch as a RFC patch to get a perfect fix. > In this case, I think the problem is something like: "on these > multi-function JMicron devices, the PATA controller at function 1 > doesn't work if it is powered on before the SATA controller at > function 0." > exactly. > It's also helpful to include a symptom to help people connect a > problem with the solution. For example, the 81551 bug reports says > these are symptoms: > > pata_jmicron 0000:02:00.1: Refused to change power state, currently in D3 > irq 17: nobody cared (try booting with the "irqpoll" option) > okay. > This patch doesn't change the workaround: we still use > device_disable_async_suspend() on these devices. This patch merely: > > 1) Changes the workaround so instead of doing this only for 361 and > 363 devices, we turn off async suspend on *all* JMicron devices, and > > 2) Moves the workaround from the AHCI and PATA drivers to the PCI > core. > > For 1), I think it is probably overkill to penalize all JMicron > devices. There's no reason to think NICs and SD/MMC/etc controllers > have the same issue. Or if there *is* reason to think that, you > should give evidence for it. No, I don't have any evidence. It's just because the previous fix becomes insufficient, which makes people wondering if we should disable all of them. > > For 2), you have not made a convincing argument for why the workaround > needs to be in the PCI core. Such an argument would be of the form > "we need this quirk even if the driver hasn't been loaded yet > because ..." Good point, Jay and Barto, can you please verify this? > Since you didn't say anything like that, I assume the patch in comment > #72 of bug #81551 (https://bugzilla.kernel.org/attachment.cgi?id=156301) > would work as well. That has the advantage that it wouldn't penalize > non-storage controllers. > Yes, the fix in driver code also works. But let's wait for the test results from Jay and Barto because this problem really sounds like a dependency in PCI code. > Other minor nits: > > - The function "pci_resume_noirq()" does not exist. I assume you meant > pci_pm_resume_noirq(). And as I mentioned earlier, I don't think the > name is relevant in the changelog anyway. > > - You have a mix of SHA1 lengths and summary quoting above. Please use > the conventional commit reference style (with 12-char SHA-1) > consistently, e.g., > > 76569faa62c4 ("PM / sleep: Asynchronous threads for resume_noirq") > > - Please use http://lkml.kernel.org/r/... references when possible, > not https://lkml.org/.... > Thanks for pointing these out. I was not aware of such mistakes before. Thank you. -rui > > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=81511 > > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=81551 > > Reviewed-by: Aaron Lu <aaron.lu@intel.com> > > Acked-by: Chuansheng Liu <chuansheng.liu@intel.com> > > Acked-by: Tejun Heo <tj@kernel.org> > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > --- > > drivers/ata/ahci.c | 12 ------------ > > drivers/ata/pata_jmicron.c | 12 ------------ > > drivers/pci/quirks.c | 17 +++++++++++++++++ > > 3 files changed, 17 insertions(+), 24 deletions(-) > > > > diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c > > index 7e62751..26bb40d 100644 > > --- a/drivers/ata/ahci.c > > +++ b/drivers/ata/ahci.c > > @@ -1451,18 +1451,6 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) > > else if (pdev->vendor == 0x177d && pdev->device == 0xa01c) > > ahci_pci_bar = AHCI_PCI_BAR_CAVIUM; > > > > - /* > > - * The JMicron chip 361/363 contains one SATA controller and one > > - * PATA controller,for powering on these both controllers, we must > > - * follow the sequence one by one, otherwise one of them can not be > > - * powered on successfully, so here we disable the async suspend > > - * method for these chips. > > - */ > > - if (pdev->vendor == PCI_VENDOR_ID_JMICRON && > > - (pdev->device == PCI_DEVICE_ID_JMICRON_JMB363 || > > - pdev->device == PCI_DEVICE_ID_JMICRON_JMB361)) > > - device_disable_async_suspend(&pdev->dev); > > - > > /* acquire resources */ > > rc = pcim_enable_device(pdev); > > if (rc) > > diff --git a/drivers/ata/pata_jmicron.c b/drivers/ata/pata_jmicron.c > > index 47e418b..4d1a5d2 100644 > > --- a/drivers/ata/pata_jmicron.c > > +++ b/drivers/ata/pata_jmicron.c > > @@ -143,18 +143,6 @@ static int jmicron_init_one (struct pci_dev *pdev, const struct pci_device_id *i > > }; > > const struct ata_port_info *ppi[] = { &info, NULL }; > > > > - /* > > - * The JMicron chip 361/363 contains one SATA controller and one > > - * PATA controller,for powering on these both controllers, we must > > - * follow the sequence one by one, otherwise one of them can not be > > - * powered on successfully, so here we disable the async suspend > > - * method for these chips. > > - */ > > - if (pdev->vendor == PCI_VENDOR_ID_JMICRON && > > - (pdev->device == PCI_DEVICE_ID_JMICRON_JMB363 || > > - pdev->device == PCI_DEVICE_ID_JMICRON_JMB361)) > > - device_disable_async_suspend(&pdev->dev); > > - > > return ata_pci_bmdma_init_one(pdev, ppi, &jmicron_sht, NULL, 0); > > } > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > index e9fd0e9..02803f8 100644 > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -29,6 +29,23 @@ > > #include "pci.h" > > > > /* > > + * For JMicron chips, we need to disable the async_suspend method, otherwise > > + * they will hit the power-on issue when doing device resume, add one quick > > + * solution to disable the async_suspend method. > > + * > > + * https://bugzilla.kernel.org/show_bug.cgi?id=81551 > > + */ > > +static void pci_async_suspend_fixup(struct pci_dev *pdev) > > +{ > > + /* > > + * disabling the async_suspend method for JMicron chips to > > + * avoid device resuming issue. > > + */ > > + device_disable_async_suspend(&pdev->dev); > > +} > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, pci_async_suspend_fixup); > > + > > +/* > > * Decoding should be disabled for a PCI device during BAR sizing to avoid > > * conflict. But doing so may cause problems on host bridge and perhaps other > > * key system devices. For devices that need to have mmio decoding always-on, > > -- > > 1.9.1 > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 15, 2015 at 3:33 PM, Jay <MyMailClone@t-online.de> wrote: > Am Samstag, 15. August 2015, 09:39:24 schrieb Bjorn Helgaas: >> Hi Jay, >> >> On Sat, Aug 15, 2015 at 7:49 AM, Jay <MyMailClone@t-online.de> wrote: >> > Am Freitag, 14. August 2015, 15:17:52 schrieb Bjorn Helgaas: >> > ... >> > >> >> Since you didn't say anything like that, I assume the patch in comment >> >> #72 of bug #81551 (https://bugzilla.kernel.org/attachment.cgi?id=156301) >> >> would work as well. That has the advantage that it wouldn't penalize >> >> non-storage controllers. >> > >> > The problem was introduced with 3.15 and the kernel is almost at 4.2 now. >> > >> > A general solution for the JMicron-AHCI/PATA-controllers is still missing >> > although available since late September 2014. It was tested by Barto. >> > >> > Seems what I wrote in comment #76 wasn't completely wrong: >> > https://bugzilla.kernel.org/show_bug.cgi?id=81551#c76 >> > >> > So this may be an opportunity to reconsider the approach to solving >> > things like this. >> >> I don't understand your point, except to acknowledge that we are >> imperfect, have limited resources, and make many mistakes in >> diagnosing problems and communicating solutions. Do you have a >> proposal for a better approach? >> >> The patch that Barto tested in late September 2014 >> (https://bugzilla.kernel.org/show_bug.cgi?id=81551#c66) is exactly the >> patch from comment #72 that I mentioned as possibly being a better >> solution. >> >> That patch wouldn't involve me at all, since it doesn't touch PCI. >> Zhang is proposing a PCI change, so I'm asking for a clear changelog. >> >> A changelog is a "write-once, read-many" situation. It's very >> important that it be concise and clear, and it's worth having the >> expert spend extra time writing the log to make it easier for the many >> novices that will read it in the future. > > please don't feel offended. Nobody should. No need to. And I like what you > wrote, it's definitely constructive. > > My point is that instead of looking for a "perfect" solution, a more pragmatic > approach may be more effective in solving things like this. > > And this was exactly what I suggested in comment #76: "Let's be pragmatic, > take this patch and solve the problem now. And then you (developers) may take > your time to look for a better or even the "perfect" solution." > > That way the case would have been closed a year ago. The only problem with that approach is that the owner of the issue now considers the issue closed, so the better solution never happens. Even if it *does* happen, we now have a fix for a fix, which makes it that much more difficult to follow the history. This is just a generic issue with the way Linux works: maintainers have zero leverage after accepting a patch, so anything they really care about has to happen before that. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 15, 2015 at 11:57 AM, Zhang Rui <rui.zhang@intel.com> wrote: > Hi, Bjorn, > > On Fri, 2015-08-14 at 15:17 -0500, Bjorn Helgaas wrote: >> Hi Zhang, >> >> On Wed, Jul 29, 2015 at 09:39:11AM +0800, Zhang Rui wrote: >> > From c6b063b2874b62466362a10b53097216b16e400d Mon Sep 17 00:00:00 2001 >> > From: Zhang Rui <rui.zhang@intel.com> >> > Date: Sun, 26 Jul 2015 14:15:36 +0800 >> > Subject: [PATCH] PCI: Disable async suspend/resume for Jmicron chip >> > >> > In https://bugzilla.kernel.org/show_bug.cgi?id=81551, >> > we found that Jmicron chip 361/363 is broken after resume if async noirq >> > (76569faa62 (PM / sleep: Asynchronous threads for resume_noirq)) is supported, >> > thus commit e6b7e41cdd (ata: Disabling the async PM for JMicron chip 363/361) >> > is introduced to fix this problem. >> > But then, we found that Jmicron chip 368 also has this problem, and it is decided >> > to disable the pm async feature for all the Jmicron chips. >> > >> > But how to fix this was discussed in the mailing list for some time. >> > After some investigation, we believe that a proper fix is to disable >> > the async PM in PCI instead of ata driver, because, on this platform, >> > pci_resume_noirq() of IDE controller must happen after pci_resume_noirq() >> > of AHCI controller. But as .resume_noirq() of the pata_jmicron driver is >> > no-op, this suggests that it is the PCI common actions, aka, >> > pci_pm_default_resume_early(), have the dependency. >> > To fix this, using device_pm_wait_for_dev() in pata_jmicron driver can not >> > solve the dependency because pci_pm_default_resume_early() is invoked before >> > driver callback being invoked, plus, as it is the PCI common actions that >> > have the dependency, it is reasonable to fix it in PCI bus code, >> > rather than driver code. >> > >> > This patch is made based on the patch from Liu Chuansheng at >> > https://lkml.org/lkml/2014/12/5/74 >> > it reverts commit e6b7e41cdd ("ata: Disabling the async PM for JMicron >> > chip 363/361"), and introduces a PCI quirk to disable async PM for Jmicron >> > chips. >> >> This changelog has a lot of text, but doesn't tell me what I really >> want to know. >> >> We need to know what the problem is from the *device* point of view, >> not in terms of kernel function names. Function names are only >> meaningful to a handful of experts, but a concrete problem description >> may be useful to hardware designers and firmware writers who can >> potentially fix the root issue. >> > Well, I sent this patch just because there is a regression since 3.15, > and we already have two patches that have been verified to fix the > problem, but none of them are pushed for upstream. I don't have many PCI > background, thus I chose to use function names to make myself precise > enough but apparently I failed :p Yep. If you use function names, they have to be real function names, not "almost" function names. > Maybe I should send this patch as a RFC patch to get a perfect fix. RFC has nothing to do with it. The problem is that the changelog doesn't say what the problem is or how we're fixing it (except in such abstract terms as to be useless). >> This patch doesn't change the workaround: we still use >> device_disable_async_suspend() on these devices. This patch merely: >> >> 1) Changes the workaround so instead of doing this only for 361 and >> 363 devices, we turn off async suspend on *all* JMicron devices, and >> >> 2) Moves the workaround from the AHCI and PATA drivers to the PCI >> core. >> >> For 1), I think it is probably overkill to penalize all JMicron >> devices. There's no reason to think NICs and SD/MMC/etc controllers >> have the same issue. Or if there *is* reason to think that, you >> should give evidence for it. > > No, I don't have any evidence. It's just because the previous fix > becomes insufficient, which makes people wondering if we should disable > all of them. It's reasonable to say "JMicron multi-function PATA controllers X, Y, and Z have this problem; let's assume all JMicron multi-function PATA controllers have it." All those controllers are likely to share some silicon design. Saying "all JMicron adapters, even single-function and NIC and SD/etc. adapters, have this problem" is not nearly such an obvious conclusion. >> For 2), you have not made a convincing argument for why the workaround >> needs to be in the PCI core. Such an argument would be of the form >> "we need this quirk even if the driver hasn't been loaded yet >> because ..." > > Good point, Jay and Barto, can you please verify this? I don't know Jay or Barto, but I don't even know what you're asking them to verify, so I think you are asking too much of them. What exactly would you like them to do? The scenario where I can imagine the quirk would need to be in the core is the following, and if I were you, this is the scenario I would be asking them to test: - cold boot system with comment #72 patch (quirks in ahci & pata_jmicron) - load ahci driver to claim JMicron SATA device - suspend system, which powers down both SATA and PATA devices - resume system, which powers up PATA (function 1), then SATA (function 0) - verify that SATA works fine - load pata_jmicron driver to claim PATA - see whether PATA works Now, the question is whether PATA works after resuming and loading pata_jmicron. If it does, then it should be fine to keep the quirks in ahci and pata_jmicron. If it doesn't work (and I think it's actually likely that it *won't* work), then we probably need to put the quirks in the PCI core so they will happen even before ahci and pata_jmicron are loaded. >> Since you didn't say anything like that, I assume the patch in comment >> #72 of bug #81551 (https://bugzilla.kernel.org/attachment.cgi?id=156301) >> would work as well. That has the advantage that it wouldn't penalize >> non-storage controllers. >> > Yes, the fix in driver code also works. But let's wait for the test > results from Jay and Barto because this problem really sounds like a > dependency in PCI code. It might "sound like" a dependency in PCI code, but that sort of hand-waving makes me think you don't really understand the problem. There might be something in the PCI specs that says you have to power up function 0 before function 1. I don't think that's the case, because we don't see this problem with other multi-function devices. But if it were the case, you should write a PCI patch to enforce that ordering and include a spec citation in the changelog. That would fix this device as well as potentially others. If the spec doesn't require that ordering, then it's probably a hardware defect. It's possible there's a way to describe the ordering via ACPI; if there is, you could argue that the lack of that description is a BIOS defect. Either way, it's not a PCI core problem. We might still want a workaround in the PCI core, but we need to be clear about what the problem is. >> Other minor nits: >> >> - The function "pci_resume_noirq()" does not exist. I assume you meant >> pci_pm_resume_noirq(). And as I mentioned earlier, I don't think the >> name is relevant in the changelog anyway. >> >> - You have a mix of SHA1 lengths and summary quoting above. Please use >> the conventional commit reference style (with 12-char SHA-1) >> consistently, e.g., >> >> 76569faa62c4 ("PM / sleep: Asynchronous threads for resume_noirq") >> >> - Please use http://lkml.kernel.org/r/... references when possible, >> not https://lkml.org/.... >> > Thanks for pointing these out. I was not aware of such mistakes before. I forgot to mention that your changelog has random line lengths. New paragraphs start after a blank line. A new sentence does not start a new line. Fill the lines so they fit in 75 columns so they fit in 80 columns after git indents them. Usually I fix things like that myself. But there were other issues, and your SHA-1 citations weren't even consistent with each other, which I think is sloppy, and I get grumpy when people ask me to merge sloppy work. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le 14/08/2015 22:17, Bjorn Helgaas a écrit : > For 1), I think it is probably overkill to penalize all JMicron > devices. There's no reason to think NICs and SD/MMC/etc controllers > have the same issue. Or if there *is* reason to think that, you > should give evidence for it. if you don't want to penalize all JMicron models then a workaround would be to create a new feature : the ability for the user to disable async for a specific device, for example a kernel parameter ( no-async-for-jmicron ), or a more configurable mechanism, like a configuration file where the user can add the name of the device, a kind of blacklist in order to disable async only for a list of devices, it's only a workaround, the real solution would to be find the real root cause ( a bug in async method ? a design flaw in JMicron hardware ? a bios/acpi bug ? ) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 7e62751..26bb40d 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1451,18 +1451,6 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) else if (pdev->vendor == 0x177d && pdev->device == 0xa01c) ahci_pci_bar = AHCI_PCI_BAR_CAVIUM; - /* - * The JMicron chip 361/363 contains one SATA controller and one - * PATA controller,for powering on these both controllers, we must - * follow the sequence one by one, otherwise one of them can not be - * powered on successfully, so here we disable the async suspend - * method for these chips. - */ - if (pdev->vendor == PCI_VENDOR_ID_JMICRON && - (pdev->device == PCI_DEVICE_ID_JMICRON_JMB363 || - pdev->device == PCI_DEVICE_ID_JMICRON_JMB361)) - device_disable_async_suspend(&pdev->dev); - /* acquire resources */ rc = pcim_enable_device(pdev); if (rc) diff --git a/drivers/ata/pata_jmicron.c b/drivers/ata/pata_jmicron.c index 47e418b..4d1a5d2 100644 --- a/drivers/ata/pata_jmicron.c +++ b/drivers/ata/pata_jmicron.c @@ -143,18 +143,6 @@ static int jmicron_init_one (struct pci_dev *pdev, const struct pci_device_id *i }; const struct ata_port_info *ppi[] = { &info, NULL }; - /* - * The JMicron chip 361/363 contains one SATA controller and one - * PATA controller,for powering on these both controllers, we must - * follow the sequence one by one, otherwise one of them can not be - * powered on successfully, so here we disable the async suspend - * method for these chips. - */ - if (pdev->vendor == PCI_VENDOR_ID_JMICRON && - (pdev->device == PCI_DEVICE_ID_JMICRON_JMB363 || - pdev->device == PCI_DEVICE_ID_JMICRON_JMB361)) - device_disable_async_suspend(&pdev->dev); - return ata_pci_bmdma_init_one(pdev, ppi, &jmicron_sht, NULL, 0); } diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index e9fd0e9..02803f8 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -29,6 +29,23 @@ #include "pci.h" /* + * For JMicron chips, we need to disable the async_suspend method, otherwise + * they will hit the power-on issue when doing device resume, add one quick + * solution to disable the async_suspend method. + * + * https://bugzilla.kernel.org/show_bug.cgi?id=81551 + */ +static void pci_async_suspend_fixup(struct pci_dev *pdev) +{ + /* + * disabling the async_suspend method for JMicron chips to + * avoid device resuming issue. + */ + device_disable_async_suspend(&pdev->dev); +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, pci_async_suspend_fixup); + +/* * Decoding should be disabled for a PCI device during BAR sizing to avoid * conflict. But doing so may cause problems on host bridge and perhaps other * key system devices. For devices that need to have mmio decoding always-on,