Message ID | 4A42AFAC.6000300@kernel.org (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Should we not just revert 9e9f46c44e487af0a82eb61b624553e2f7118f5b? The thing says: "At this point, it seems to solve more problems than it causes, so let's try using it by default. It's an easy revert if it ends up causing trouble." and it clearly does _not_ solve more problems than it causes, and the whole message in that commit implies we should revert it. I'm happy to apply various patches to fix it up, but regardless, I thinkwe should revert that commit as bogus. We can try making it the default again next round, when maybe it will be true that it doesn't cause issues. What did it even ever help with? Linus On Wed, 24 Jun 2009, Yinghai Lu wrote: > > for AMD system, when only one PCI root, just set PCI_NO_ROOT_CRS for it > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > --- > arch/x86/pci/amd_bus.c | 4 ++++ > 1 file changed, 4 insertions(+) > > Index: linux-2.6/arch/x86/pci/amd_bus.c > =================================================================== > --- linux-2.6.orig/arch/x86/pci/amd_bus.c > +++ linux-2.6/arch/x86/pci/amd_bus.c > @@ -561,6 +561,10 @@ static int __init early_fill_mp_bus_info > } > } > > + /* don't use _CRS if we only have one root */ > + if (pci_root_num <= 1) > + pci_probe |= PCI_NO_ROOT_CRS; > + > return 0; > } > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 24 Jun 2009, Linus Torvalds wrote: > > I'm happy to apply various patches to fix it up, but regardless, I thinkwe > should revert that commit as bogus. We can try making it the default again > next round, when maybe it will be true that it doesn't cause issues. Btw, I really think our _CRS handling sucks. There's two things that you can do with _CRS: - use the _existence_ of it as an indicator of a root bus - try to use it to populate the resource tree. And quite frankly, I think #2 is broken. There's no way in hell that ACPI tables are ever going to be better than just asking the hardware. We've gone through this before. Trusting ACPI over the hardware is just FUNDAMENTALLY WRONG. So I'm just going to do that revert. I'm not sure if it ever makes sense to make that insane _CRS code the default. It seems like a fundamentally flawed idea. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 24 Jun 2009 16:21:09 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Wed, 24 Jun 2009, Linus Torvalds wrote: > > > > I'm happy to apply various patches to fix it up, but regardless, I > > thinkwe should revert that commit as bogus. We can try making it > > the default again next round, when maybe it will be true that it > > doesn't cause issues. > > Btw, I really think our _CRS handling sucks. > > There's two things that you can do with _CRS: > > - use the _existence_ of it as an indicator of a root bus > > - try to use it to populate the resource tree. > > And quite frankly, I think #2 is broken. There's no way in hell that > ACPI tables are ever going to be better than just asking the > hardware. We've gone through this before. Trusting ACPI over the > hardware is just FUNDAMENTALLY WRONG. > > So I'm just going to do that revert. I'm not sure if it ever makes > sense to make that insane _CRS code the default. It seems like a > fundamentally flawed idea. Yeah, I think it's reasonable to revert, especially given how we do _CRS handling currently. I'm hoping at some point we can use the _CRS data to at least augment the configuration we get from hardware, since on some machines it seems to be necessary.
On Wed, 24 Jun 2009, Jesse Barnes wrote: > > Yeah, I think it's reasonable to revert, especially given how we do > _CRS handling currently. I'm hoping at some point we can use the _CRS > data to at least augment the configuration we get from hardware, since > on some machines it seems to be necessary. Agreed. I do think we should take _CRS into account - possibly just as a minimal hint to determine which root buses to try to scan (maybe we do this already, I really didn't check). Or maybe we could use it to extend on our scan information. But when it seems to have things like "this bus can forward VGA cycles" kind of "resources" (which seems to be the main reason Larry Finger has so many of them), then that's just worthless knowledge that we're much better off just determining on our own. Anyway, I may feel pretty strongly about things like this, but I'm also open to being convinced otherwise for 2.6.32. I wanted to do -rc1 today (it's been more than two weeks), and while I don't expect -rc1 to be flawless, I also hate to release it with _known_ bugs. So partly due to timing, I'd rather revert it, and we can revisit it for the next release - whatever the actual end result then will be. [ There's a difference between "we're supposed to find and fix bugs in the -rc series", and "I release known-buggy -rc1's since we're supposed to fix it later". For similar reasons, I hate pulling known-buggy stuff during the merge window - it's ok if it shows itself to be buggy _later_, but if people send me stuff that they know is buggy as they send it to me, then that's a problem. ] Linus -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 24, 2009 at 04:09:36PM -0700, Linus Torvalds wrote: > > Should we not just revert 9e9f46c44e487af0a82eb61b624553e2f7118f5b? > > The thing says: > > "At this point, it seems to solve more problems than it causes, so let's > try using it by default. It's an easy revert if it ends up causing > trouble." > > and it clearly does _not_ solve more problems than it causes, and the > whole message in that commit implies we should revert it. > > I'm happy to apply various patches to fix it up, but regardless, I thinkwe > should revert that commit as bogus. We can try making it the default again > next round, when maybe it will be true that it doesn't cause issues. > > What did it even ever help with? In our case it is needed on some of our systems for PCI hotplug to avoid MCKs due to a device under one root bus getting a resource during hotplug that can only be used by devices under a different root bus. If the user does not intend to use PCI hotplug, the function isn't needed (i.e. 'pci=use_crs' can be omitted) because resources are properly directed to the installed cards via BIOS pre-assignment. Gary
On Wed, 24 Jun 2009 16:54:08 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, 24 Jun 2009, Jesse Barnes wrote: > > > > Yeah, I think it's reasonable to revert, especially given how we do > > _CRS handling currently. I'm hoping at some point we can use the > > _CRS data to at least augment the configuration we get from > > hardware, since on some machines it seems to be necessary. > > Agreed. I do think we should take _CRS into account - possibly just > as a minimal hint to determine which root buses to try to scan (maybe > we do this already, I really didn't check). Or maybe we could use it > to extend on our scan information. > > But when it seems to have things like "this bus can forward VGA > cycles" kind of "resources" (which seems to be the main reason Larry > Finger has so many of them), then that's just worthless knowledge > that we're much better off just determining on our own. Yeah, some of those bits don't seem useful; but OTOH if a given bridge decodes a certain range in a non-configurable way how else will we get that info w/o having huge tables of per-chip info? Those are the sorts of things I worry about with our current resource handling. Maybe not directly _CRS related, but still... > Anyway, I may feel pretty strongly about things like this, but I'm > also open to being convinced otherwise for 2.6.32. I wanted to do > -rc1 today (it's been more than two weeks), and while I don't expect > -rc1 to be flawless, I also hate to release it with _known_ bugs. Sure, we'll keep at it and see if we can get something better in place for .32. > So partly due to timing, I'd rather revert it, and we can revisit it > for the next release - whatever the actual end result then will be. > > [ There's a difference between "we're supposed to find and fix bugs > in the -rc series", and "I release known-buggy -rc1's since we're > supposed to fix it later". For similar reasons, I hate pulling > known-buggy stuff during the merge window - it's ok if it shows > itself to be buggy _later_, but if people send me stuff that they > know is buggy as they send it to me, then that's a problem. ] Yeah, 100% agreed. I didn't hear any reports until after people started using your tree, so I think this case was handled correctly: push something that *seems* ok upstream, but with eyes wide open for the possibility we'd need to revert. Thanks,
* Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > > [ There's a difference between "we're supposed to find and fix bugs > > in the -rc series", and "I release known-buggy -rc1's since we're > > supposed to fix it later". For similar reasons, I hate pulling > > known-buggy stuff during the merge window - it's ok if it shows > > itself to be buggy _later_, but if people send me stuff that they > > know is buggy as they send it to me, then that's a problem. ] > > Yeah, 100% agreed. I didn't hear any reports until after people > started using your tree, so I think this case was handled > correctly: push something that *seems* ok upstream, but with eyes > wide open for the possibility we'd need to revert. There's only one small gripe i have with the handling of it: the timing. "9e9f46c: PCI: use ACPI _CRS data by default" was written and committed on June 11th, two days _after_ the merge window opened. That's way too late for maybe-broken changes to x86 lowlevel details (especially if it touches hw-environmental interaction - which is very hard to test with meaningful coverage), and it's also pretty much the worst moment to solicit testing from people who are busy getting their stuff to Linus and who are busy testing out any of the unexpected interactions and bugs. So this was, to a certain degree, a predictable outcome. Trees in the Linux "critical path" of testing (core kernel, x86, core networking, very common drivers, PCI, driver core, VFS, etc.) should generally try to cool down 1-2 weeks before the merge window - because breakage there can do a lot of knock-on cascading damage. Two weeks is not a lot of time and the effects of showstopper bugs get magnified disproportionately. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2009-06-25 at 09:03 +0200, Ingo Molnar wrote: > * Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > > > > [ There's a difference between "we're supposed to find and fix bugs > > > in the -rc series", and "I release known-buggy -rc1's since we're > > > supposed to fix it later". For similar reasons, I hate pulling > > > known-buggy stuff during the merge window - it's ok if it shows > > > itself to be buggy _later_, but if people send me stuff that they > > > know is buggy as they send it to me, then that's a problem. ] > > > > Yeah, 100% agreed. I didn't hear any reports until after people > > started using your tree, so I think this case was handled > > correctly: push something that *seems* ok upstream, but with eyes > > wide open for the possibility we'd need to revert. > > There's only one small gripe i have with the handling of it: the > timing. "9e9f46c: PCI: use ACPI _CRS data by default" was written > and committed on June 11th, two days _after_ the merge window > opened. > > That's way too late for maybe-broken changes to x86 lowlevel details > (especially if it touches hw-environmental interaction - which is > very hard to test with meaningful coverage), and it's also pretty > much the worst moment to solicit testing from people who are busy > getting their stuff to Linus and who are busy testing out any of the > unexpected interactions and bugs. > > So this was, to a certain degree, a predictable outcome. Trees in > the Linux "critical path" of testing (core kernel, x86, core > networking, very common drivers, PCI, driver core, VFS, etc.) should > generally try to cool down 1-2 weeks before the merge window - > because breakage there can do a lot of knock-on cascading damage. > Two weeks is not a lot of time and the effects of showstopper bugs > get magnified disproportionately. > Yes, I was also thinking about this when I checked the commit date. And totally agree with Ingo's suggestions. Thanks, -- JSR -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 25 Jun 2009 09:03:47 +0200 Ingo Molnar <mingo@elte.hu> wrote: > > * Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > > > > [ There's a difference between "we're supposed to find and fix > > > bugs in the -rc series", and "I release known-buggy -rc1's since > > > we're supposed to fix it later". For similar reasons, I hate > > > pulling known-buggy stuff during the merge window - it's ok if it > > > shows itself to be buggy _later_, but if people send me stuff > > > that they know is buggy as they send it to me, then that's a > > > problem. ] > > > > Yeah, 100% agreed. I didn't hear any reports until after people > > started using your tree, so I think this case was handled > > correctly: push something that *seems* ok upstream, but with eyes > > wide open for the possibility we'd need to revert. > > There's only one small gripe i have with the handling of it: the > timing. "9e9f46c: PCI: use ACPI _CRS data by default" was written > and committed on June 11th, two days _after_ the merge window > opened. > > That's way too late for maybe-broken changes to x86 lowlevel details > (especially if it touches hw-environmental interaction - which is > very hard to test with meaningful coverage), and it's also pretty > much the worst moment to solicit testing from people who are busy > getting their stuff to Linus and who are busy testing out any of the > unexpected interactions and bugs. True, the patch had been around long before that and I definitely should have committed it sooner.
Index: linux-2.6/arch/x86/pci/amd_bus.c =================================================================== --- linux-2.6.orig/arch/x86/pci/amd_bus.c +++ linux-2.6/arch/x86/pci/amd_bus.c @@ -561,6 +561,10 @@ static int __init early_fill_mp_bus_info } } + /* don't use _CRS if we only have one root */ + if (pci_root_num <= 1) + pci_probe |= PCI_NO_ROOT_CRS; + return 0; }
for AMD system, when only one PCI root, just set PCI_NO_ROOT_CRS for it Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/pci/amd_bus.c | 4 ++++ 1 file changed, 4 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html