Message ID | 1466028904-28328-1-git-send-email-pjones@redhat.com (mailing list archive) |
---|---|
State | Rejected, archived |
Headers | show |
On Thu, Jun 16, 2016 at 12:15 AM, Peter Jones <pjones@redhat.com> wrote: > Right now when booting, on many laptops the firmware manages the PCIe > bus. As a result, when we call the _OSC ACPI method, it returns an > error code. Unfortunately the errors are not very articulate. What exactly do you mean here? > As a result, we show: > > ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) > acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] > \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid UUID > _OSC request data: 1 1f 0 So _OSC told us that the UUID was invalid, didn't it? According to the spec: "Bit [2] – Unrecognized UUID. This bit is set to indicate that the platform firmware does not recognize the UUID passed in via Arg0. Capabilities bits are preserved." > acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM > > But we did get the capabilities mask back; Right. As per the spec quoted above. > the firmware is merely managing this itself. How can we know that? > So we really should not be showing the user a > message that looks like the firmware is broken, since it is working just > fine. Or is it? > This patch supresses the error message when we're calling _OSC with the > PCIe host bridge UUID, and replaces it with a relatively innocuous > looking message that you can find if you're looking for it. So what happens is that the firmware manages the PCIe stuff itself, but instead of telling that to us upfront by clearing the mask, it simply says that the UUID is unknown, because ASL writers can't be bothered with handling that case correctly, right? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 16, 2016 at 2:12 AM, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Thu, Jun 16, 2016 at 12:15 AM, Peter Jones <pjones@redhat.com> wrote: >> Right now when booting, on many laptops the firmware manages the PCIe >> bus. As a result, when we call the _OSC ACPI method, it returns an >> error code. Unfortunately the errors are not very articulate. > > What exactly do you mean here? > >> As a result, we show: >> >> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) >> acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] >> \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid UUID >> _OSC request data: 1 1f 0 > > So _OSC told us that the UUID was invalid, didn't it? BTW, the above messages are KERN_DEBUG, so at least in theory they shouldn't be visible in production runs. Maybe the bug to fix is that they show up when they aren't supposed to? -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(Sorry for the slow response - it's deadline time over here.) On Thu, Jun 16, 2016 at 04:56:57PM +0200, Rafael J. Wysocki wrote: > On Thu, Jun 16, 2016 at 2:12 AM, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Thu, Jun 16, 2016 at 12:15 AM, Peter Jones <pjones@redhat.com> wrote: > >> Right now when booting, on many laptops the firmware manages the PCIe > >> bus. As a result, when we call the _OSC ACPI method, it returns an > >> error code. Unfortunately the errors are not very articulate. > > > > What exactly do you mean here? > > > >> As a result, we show: > >> > >> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) > >> acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] > >> \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid UUID > >> _OSC request data: 1 1f 0 > > > > So _OSC told us that the UUID was invalid, didn't it? > > BTW, the above messages are KERN_DEBUG, so at least in theory they > shouldn't be visible in production runs. > > Maybe the bug to fix is that they show up when they aren't supposed to? No - the workflow that I am really trying to remedy is this: 1) S3 resume sometimes isn't working on some laptop you've got. 2) start looking at debug messages 3) this shows an error, so it looks like it's probably the problem 4) go fishing for red herring 5) if you happen to know who maintains the DSDT for the platform in question, eventually work out that this is working as intended and the bug is someplace else. 5b) if you don't know that person, eventually work out that it /might/ be someplace else... So the idea was to make it look more like an indication of status, and less like an error that's causing unrelated problems. When I talked to Mario at Dell (Cc'd), it wasn't clear to us that there's a way to distinguish the between the UUID being invalid/malformed, being merely unsupported, or being supported in some configurations but not the current one. In this particular DSDT, the machine doesn't support the OS controlling any of this if USB-C / thunderbolt are enabled. The DSDT is clearly written with the belief that you have to completely disable the handling for that UUID in this case, and googling for this looks like it's not the only one written with that belief. Reading the spec (v6.1, sections 6.2.11.3 and 6.2.11.4), it seems plausible that you can express this instead by handling the UUID but choosing each individual query/status bit in the way that accomplishes the OS doing nothing with the response. So it may well be that that's just more code that vendors have thought wasn't necessary (or wasn't correct for some reason.) Mario, want to jump in on your thinking here?
> -----Original Message----- > From: Peter Jones [mailto:pjones@redhat.com] > Sent: Tuesday, June 21, 2016 10:19 AM > To: Rafael J. Wysocki <rafael@kernel.org> > Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Limonciello, Mario > <Mario_Limonciello@Dell.com>; Linux Kernel Mailing List <linux- > kernel@vger.kernel.org>; Len Brown <lenb@kernel.org>; Rafael J . Wysocki > <rjw@rjwysocki.net>; Andy Lutomirski <luto@amacapital.net> > Subject: Re: [PATCH] ACPI: don't show an error when we're not in charge of > PCIe hotplug. > > (Sorry for the slow response - it's deadline time over here.) > > On Thu, Jun 16, 2016 at 04:56:57PM +0200, Rafael J. Wysocki wrote: > > On Thu, Jun 16, 2016 at 2:12 AM, Rafael J. Wysocki <rafael@kernel.org> > wrote: > > > On Thu, Jun 16, 2016 at 12:15 AM, Peter Jones <pjones@redhat.com> > wrote: > > >> Right now when booting, on many laptops the firmware manages the > PCIe > > >> bus. As a result, when we call the _OSC ACPI method, it returns an > > >> error code. Unfortunately the errors are not very articulate. > > > > > > What exactly do you mean here? > > > > > >> As a result, we show: > > >> > > >> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) > > >> acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM > Segments MSI] > > >> \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid > UUID > > >> _OSC request data: 1 1f 0 > > > > > > So _OSC told us that the UUID was invalid, didn't it? > > > > BTW, the above messages are KERN_DEBUG, so at least in theory they > > shouldn't be visible in production runs. > > > > Maybe the bug to fix is that they show up when they aren't supposed to? > > No - the workflow that I am really trying to remedy is this: > > 1) S3 resume sometimes isn't working on some laptop you've got. > 2) start looking at debug messages > 3) this shows an error, so it looks like it's probably the problem > 4) go fishing for red herring > 5) if you happen to know who maintains the DSDT for the platform in > question, eventually work out that this is working as intended and > the bug is someplace else. > 5b) if you don't know that person, eventually work out that it /might/ > be someplace else... > > So the idea was to make it look more like an indication of status, and > less like an error that's causing unrelated problems. > > When I talked to Mario at Dell (Cc'd), it wasn't clear to us that > there's a way to distinguish the between the UUID being > invalid/malformed, being merely unsupported, or being supported in some > configurations but not the current one. In this particular DSDT, the > machine doesn't support the OS controlling any of this if USB-C / > thunderbolt are enabled. The DSDT is clearly written with the belief > that you have to completely disable the handling for that UUID in this > case, and googling for this looks like it's not the only one written > with that belief. > > Reading the spec (v6.1, sections 6.2.11.3 and 6.2.11.4), it seems > plausible that you can express this instead by handling the UUID but > choosing each individual query/status bit in the way that accomplishes > the OS doing nothing with the response. So it may well be that that's > just more code that vendors have thought wasn't necessary (or wasn't > correct for some reason.) > > Mario, want to jump in on your thinking here? > > -- > Peter After talking to the team, I was told this particular implementation to not let OS take control when acting on that specific UUID based upon a variable (NEXP in this case) came from Intel RC code. That's probably why this is all across a lot of platforms, including non-Dell. At least in the context of the laptop Peter noticed this on (Dell XPS 13 9350) NEXP is set in GNVS based upon Thunderbolt capability. As for why they return unrecognized UUID instead of just masking all the capabilities bits? It's the same net functional result. If the vendor provided RC code doesn't caused WCHK problems or functional problems it's hard to make a case for why it needs to be changed by the OEM. I think that Peter's patch is appropriate to message this is specifically what's going on.
On Tue, Jun 21, 2016 at 11:01 AM, <Mario_Limonciello@dell.com> wrote: >> -----Original Message----- >> From: Peter Jones [mailto:pjones@redhat.com] >> Sent: Tuesday, June 21, 2016 10:19 AM >> To: Rafael J. Wysocki <rafael@kernel.org> >> Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Limonciello, Mario >> <Mario_Limonciello@Dell.com>; Linux Kernel Mailing List <linux- >> kernel@vger.kernel.org>; Len Brown <lenb@kernel.org>; Rafael J . Wysocki >> <rjw@rjwysocki.net>; Andy Lutomirski <luto@amacapital.net> >> Subject: Re: [PATCH] ACPI: don't show an error when we're not in charge of >> PCIe hotplug. >> >> (Sorry for the slow response - it's deadline time over here.) >> >> On Thu, Jun 16, 2016 at 04:56:57PM +0200, Rafael J. Wysocki wrote: >> > On Thu, Jun 16, 2016 at 2:12 AM, Rafael J. Wysocki <rafael@kernel.org> >> wrote: >> > > On Thu, Jun 16, 2016 at 12:15 AM, Peter Jones <pjones@redhat.com> >> wrote: >> > >> Right now when booting, on many laptops the firmware manages the >> PCIe >> > >> bus. As a result, when we call the _OSC ACPI method, it returns an >> > >> error code. Unfortunately the errors are not very articulate. >> > > >> > > What exactly do you mean here? >> > > >> > >> As a result, we show: >> > >> >> > >> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) >> > >> acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM >> Segments MSI] >> > >> \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid >> UUID >> > >> _OSC request data: 1 1f 0 >> > > >> > > So _OSC told us that the UUID was invalid, didn't it? >> > >> > BTW, the above messages are KERN_DEBUG, so at least in theory they >> > shouldn't be visible in production runs. >> > >> > Maybe the bug to fix is that they show up when they aren't supposed to? >> >> No - the workflow that I am really trying to remedy is this: >> >> 1) S3 resume sometimes isn't working on some laptop you've got. >> 2) start looking at debug messages >> 3) this shows an error, so it looks like it's probably the problem >> 4) go fishing for red herring >> 5) if you happen to know who maintains the DSDT for the platform in >> question, eventually work out that this is working as intended and >> the bug is someplace else. >> 5b) if you don't know that person, eventually work out that it /might/ >> be someplace else... >> >> So the idea was to make it look more like an indication of status, and >> less like an error that's causing unrelated problems. >> >> When I talked to Mario at Dell (Cc'd), it wasn't clear to us that >> there's a way to distinguish the between the UUID being >> invalid/malformed, being merely unsupported, or being supported in some >> configurations but not the current one. In this particular DSDT, the >> machine doesn't support the OS controlling any of this if USB-C / >> thunderbolt are enabled. The DSDT is clearly written with the belief >> that you have to completely disable the handling for that UUID in this >> case, and googling for this looks like it's not the only one written >> with that belief. >> >> Reading the spec (v6.1, sections 6.2.11.3 and 6.2.11.4), it seems >> plausible that you can express this instead by handling the UUID but >> choosing each individual query/status bit in the way that accomplishes >> the OS doing nothing with the response. So it may well be that that's >> just more code that vendors have thought wasn't necessary (or wasn't >> correct for some reason.) >> >> Mario, want to jump in on your thinking here? >> >> -- >> Peter > > After talking to the team, I was told this particular implementation to not let > OS take control when acting on that specific UUID based upon a variable > (NEXP in this case) came from Intel RC code. > > That's probably why this is all across a lot of platforms, including non-Dell. > > At least in the context of the laptop Peter noticed this on (Dell XPS 13 9350) > NEXP is set in GNVS based upon Thunderbolt capability. > > As for why they return unrecognized UUID instead of just masking all the > capabilities bits? It's the same net functional result. If the vendor provided > RC code doesn't caused WCHK problems or functional problems it's hard to > make a case for why it needs to be changed by the OEM. > > I think that Peter's patch is appropriate to message this is specifically > what's going on. > In general, it seems that Windows is considerably less inclined to throw out errors when the DSDT contains spec violations. It would be great if Microsoft were to tighten up their requirements :) I wonder if Intel could be persuaded to add an ACPI mechanism to grant PCIe native control per device or per bus instead of more-or-less globally as it is now. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 21, 2016 at 8:01 PM, <Mario_Limonciello@dell.com> wrote: >> -----Original Message----- >> From: Peter Jones [mailto:pjones@redhat.com] >> Sent: Tuesday, June 21, 2016 10:19 AM >> To: Rafael J. Wysocki <rafael@kernel.org> >> Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Limonciello, Mario >> <Mario_Limonciello@Dell.com>; Linux Kernel Mailing List <linux- >> kernel@vger.kernel.org>; Len Brown <lenb@kernel.org>; Rafael J . Wysocki >> <rjw@rjwysocki.net>; Andy Lutomirski <luto@amacapital.net> >> Subject: Re: [PATCH] ACPI: don't show an error when we're not in charge of >> PCIe hotplug. >> >> (Sorry for the slow response - it's deadline time over here.) >> >> On Thu, Jun 16, 2016 at 04:56:57PM +0200, Rafael J. Wysocki wrote: >> > On Thu, Jun 16, 2016 at 2:12 AM, Rafael J. Wysocki <rafael@kernel.org> >> wrote: >> > > On Thu, Jun 16, 2016 at 12:15 AM, Peter Jones <pjones@redhat.com> >> wrote: >> > >> Right now when booting, on many laptops the firmware manages the >> PCIe >> > >> bus. As a result, when we call the _OSC ACPI method, it returns an >> > >> error code. Unfortunately the errors are not very articulate. >> > > >> > > What exactly do you mean here? >> > > >> > >> As a result, we show: >> > >> >> > >> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) >> > >> acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM >> Segments MSI] >> > >> \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid >> UUID >> > >> _OSC request data: 1 1f 0 >> > > >> > > So _OSC told us that the UUID was invalid, didn't it? >> > >> > BTW, the above messages are KERN_DEBUG, so at least in theory they >> > shouldn't be visible in production runs. >> > >> > Maybe the bug to fix is that they show up when they aren't supposed to? >> >> No - the workflow that I am really trying to remedy is this: >> >> 1) S3 resume sometimes isn't working on some laptop you've got. >> 2) start looking at debug messages >> 3) this shows an error, so it looks like it's probably the problem >> 4) go fishing for red herring >> 5) if you happen to know who maintains the DSDT for the platform in >> question, eventually work out that this is working as intended and >> the bug is someplace else. >> 5b) if you don't know that person, eventually work out that it /might/ >> be someplace else... >> >> So the idea was to make it look more like an indication of status, and >> less like an error that's causing unrelated problems. >> >> When I talked to Mario at Dell (Cc'd), it wasn't clear to us that >> there's a way to distinguish the between the UUID being >> invalid/malformed, being merely unsupported, or being supported in some >> configurations but not the current one. In this particular DSDT, the >> machine doesn't support the OS controlling any of this if USB-C / >> thunderbolt are enabled. The DSDT is clearly written with the belief >> that you have to completely disable the handling for that UUID in this >> case, and googling for this looks like it's not the only one written >> with that belief. >> >> Reading the spec (v6.1, sections 6.2.11.3 and 6.2.11.4), it seems >> plausible that you can express this instead by handling the UUID but >> choosing each individual query/status bit in the way that accomplishes >> the OS doing nothing with the response. So it may well be that that's >> just more code that vendors have thought wasn't necessary (or wasn't >> correct for some reason.) >> >> Mario, want to jump in on your thinking here? >> >> -- >> Peter > > After talking to the team, I was told this particular implementation to not let > OS take control when acting on that specific UUID based upon a variable > (NEXP in this case) came from Intel RC code. > > That's probably why this is all across a lot of platforms, including non-Dell. > > At least in the context of the laptop Peter noticed this on (Dell XPS 13 9350) > NEXP is set in GNVS based upon Thunderbolt capability. > > As for why they return unrecognized UUID instead of just masking all the > capabilities bits? It's the same net functional result. If the vendor provided > RC code doesn't caused WCHK problems or functional problems it's hard to > make a case for why it needs to be changed by the OEM. > > I think that Peter's patch is appropriate to message this is specifically > what's going on. No, it may hide real (ie. non-intentional) bugs in _OSC, so it is not appropriate. Debug-level messages really should not hurt anyone (and should never show up in production anyway). We can slightly tone down the "_OSC failed (%s); disabling ASPM\n" message in negotiate_os_control() in drivers/acpi/pci_root.c if you think it's too strong and that's it. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 21, 2016 at 8:07 PM, Andy Lutomirski <luto@amacapital.net> wrote: > On Tue, Jun 21, 2016 at 11:01 AM, <Mario_Limonciello@dell.com> wrote: >>> -----Original Message----- >>> From: Peter Jones [mailto:pjones@redhat.com] >>> Sent: Tuesday, June 21, 2016 10:19 AM >>> To: Rafael J. Wysocki <rafael@kernel.org> >>> Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Limonciello, Mario >>> <Mario_Limonciello@Dell.com>; Linux Kernel Mailing List <linux- >>> kernel@vger.kernel.org>; Len Brown <lenb@kernel.org>; Rafael J . Wysocki >>> <rjw@rjwysocki.net>; Andy Lutomirski <luto@amacapital.net> >>> Subject: Re: [PATCH] ACPI: don't show an error when we're not in charge of >>> PCIe hotplug. >>> >>> (Sorry for the slow response - it's deadline time over here.) >>> >>> On Thu, Jun 16, 2016 at 04:56:57PM +0200, Rafael J. Wysocki wrote: >>> > On Thu, Jun 16, 2016 at 2:12 AM, Rafael J. Wysocki <rafael@kernel.org> >>> wrote: >>> > > On Thu, Jun 16, 2016 at 12:15 AM, Peter Jones <pjones@redhat.com> >>> wrote: >>> > >> Right now when booting, on many laptops the firmware manages the >>> PCIe >>> > >> bus. As a result, when we call the _OSC ACPI method, it returns an >>> > >> error code. Unfortunately the errors are not very articulate. >>> > > >>> > > What exactly do you mean here? >>> > > >>> > >> As a result, we show: >>> > >> >>> > >> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) >>> > >> acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM >>> Segments MSI] >>> > >> \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid >>> UUID >>> > >> _OSC request data: 1 1f 0 >>> > > >>> > > So _OSC told us that the UUID was invalid, didn't it? >>> > >>> > BTW, the above messages are KERN_DEBUG, so at least in theory they >>> > shouldn't be visible in production runs. >>> > >>> > Maybe the bug to fix is that they show up when they aren't supposed to? >>> >>> No - the workflow that I am really trying to remedy is this: >>> >>> 1) S3 resume sometimes isn't working on some laptop you've got. >>> 2) start looking at debug messages >>> 3) this shows an error, so it looks like it's probably the problem >>> 4) go fishing for red herring >>> 5) if you happen to know who maintains the DSDT for the platform in >>> question, eventually work out that this is working as intended and >>> the bug is someplace else. >>> 5b) if you don't know that person, eventually work out that it /might/ >>> be someplace else... >>> >>> So the idea was to make it look more like an indication of status, and >>> less like an error that's causing unrelated problems. >>> >>> When I talked to Mario at Dell (Cc'd), it wasn't clear to us that >>> there's a way to distinguish the between the UUID being >>> invalid/malformed, being merely unsupported, or being supported in some >>> configurations but not the current one. In this particular DSDT, the >>> machine doesn't support the OS controlling any of this if USB-C / >>> thunderbolt are enabled. The DSDT is clearly written with the belief >>> that you have to completely disable the handling for that UUID in this >>> case, and googling for this looks like it's not the only one written >>> with that belief. >>> >>> Reading the spec (v6.1, sections 6.2.11.3 and 6.2.11.4), it seems >>> plausible that you can express this instead by handling the UUID but >>> choosing each individual query/status bit in the way that accomplishes >>> the OS doing nothing with the response. So it may well be that that's >>> just more code that vendors have thought wasn't necessary (or wasn't >>> correct for some reason.) >>> >>> Mario, want to jump in on your thinking here? >>> >>> -- >>> Peter >> >> After talking to the team, I was told this particular implementation to not let >> OS take control when acting on that specific UUID based upon a variable >> (NEXP in this case) came from Intel RC code. >> >> That's probably why this is all across a lot of platforms, including non-Dell. >> >> At least in the context of the laptop Peter noticed this on (Dell XPS 13 9350) >> NEXP is set in GNVS based upon Thunderbolt capability. >> >> As for why they return unrecognized UUID instead of just masking all the >> capabilities bits? It's the same net functional result. If the vendor provided >> RC code doesn't caused WCHK problems or functional problems it's hard to >> make a case for why it needs to be changed by the OEM. >> >> I think that Peter's patch is appropriate to message this is specifically >> what's going on. >> > > In general, it seems that Windows is considerably less inclined to > throw out errors when the DSDT contains spec violations. It would be > great if Microsoft were to tighten up their requirements :) Agreed (but I speak for myself only as usual). > I wonder if Intel could be persuaded to add an ACPI mechanism to grant > PCIe native control per device or per bus instead of more-or-less > globally as it is now. I guess at least in some cases it just has to be done that way. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: rjwysocki@gmail.com [mailto:rjwysocki@gmail.com] On Behalf Of > Rafael J. Wysocki > Sent: Tuesday, June 21, 2016 5:51 PM > To: Limonciello, Mario <Mario_Limonciello@Dell.com> > Cc: Peter Jones <pjones@redhat.com>; Rafael J. Wysocki > <rafael@kernel.org>; ACPI Devel Maling List <linux-acpi@vger.kernel.org>; > Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Len Brown > <lenb@kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>; Andy Lutomirski > <luto@amacapital.net> > Subject: Re: [PATCH] ACPI: don't show an error when we're not in charge of > PCIe hotplug. > > On Tue, Jun 21, 2016 at 8:01 PM, <Mario_Limonciello@dell.com> wrote: > >> -----Original Message----- > >> From: Peter Jones [mailto:pjones@redhat.com] > >> Sent: Tuesday, June 21, 2016 10:19 AM > >> To: Rafael J. Wysocki <rafael@kernel.org> > >> Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Limonciello, > Mario > >> <Mario_Limonciello@Dell.com>; Linux Kernel Mailing List <linux- > >> kernel@vger.kernel.org>; Len Brown <lenb@kernel.org>; Rafael J . > Wysocki > >> <rjw@rjwysocki.net>; Andy Lutomirski <luto@amacapital.net> > >> Subject: Re: [PATCH] ACPI: don't show an error when we're not in charge > of > >> PCIe hotplug. > >> > >> (Sorry for the slow response - it's deadline time over here.) > >> > >> On Thu, Jun 16, 2016 at 04:56:57PM +0200, Rafael J. Wysocki wrote: > >> > On Thu, Jun 16, 2016 at 2:12 AM, Rafael J. Wysocki <rafael@kernel.org> > >> wrote: > >> > > On Thu, Jun 16, 2016 at 12:15 AM, Peter Jones <pjones@redhat.com> > >> wrote: > >> > >> Right now when booting, on many laptops the firmware manages > the > >> PCIe > >> > >> bus. As a result, when we call the _OSC ACPI method, it returns an > >> > >> error code. Unfortunately the errors are not very articulate. > >> > > > >> > > What exactly do you mean here? > >> > > > >> > >> As a result, we show: > >> > >> > >> > >> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) > >> > >> acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM > ClockPM > >> Segments MSI] > >> > >> \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid > >> UUID > >> > >> _OSC request data: 1 1f 0 > >> > > > >> > > So _OSC told us that the UUID was invalid, didn't it? > >> > > >> > BTW, the above messages are KERN_DEBUG, so at least in theory they > >> > shouldn't be visible in production runs. > >> > > >> > Maybe the bug to fix is that they show up when they aren't supposed > to? > >> > >> No - the workflow that I am really trying to remedy is this: > >> > >> 1) S3 resume sometimes isn't working on some laptop you've got. > >> 2) start looking at debug messages > >> 3) this shows an error, so it looks like it's probably the problem > >> 4) go fishing for red herring > >> 5) if you happen to know who maintains the DSDT for the platform in > >> question, eventually work out that this is working as intended and > >> the bug is someplace else. > >> 5b) if you don't know that person, eventually work out that it /might/ > >> be someplace else... > >> > >> So the idea was to make it look more like an indication of status, and > >> less like an error that's causing unrelated problems. > >> > >> When I talked to Mario at Dell (Cc'd), it wasn't clear to us that > >> there's a way to distinguish the between the UUID being > >> invalid/malformed, being merely unsupported, or being supported in > some > >> configurations but not the current one. In this particular DSDT, the > >> machine doesn't support the OS controlling any of this if USB-C / > >> thunderbolt are enabled. The DSDT is clearly written with the belief > >> that you have to completely disable the handling for that UUID in this > >> case, and googling for this looks like it's not the only one written > >> with that belief. > >> > >> Reading the spec (v6.1, sections 6.2.11.3 and 6.2.11.4), it seems > >> plausible that you can express this instead by handling the UUID but > >> choosing each individual query/status bit in the way that accomplishes > >> the OS doing nothing with the response. So it may well be that that's > >> just more code that vendors have thought wasn't necessary (or wasn't > >> correct for some reason.) > >> > >> Mario, want to jump in on your thinking here? > >> > >> -- > >> Peter > > > > After talking to the team, I was told this particular implementation to not > let > > OS take control when acting on that specific UUID based upon a variable > > (NEXP in this case) came from Intel RC code. > > > > That's probably why this is all across a lot of platforms, including non-Dell. > > > > At least in the context of the laptop Peter noticed this on (Dell XPS 13 9350) > > NEXP is set in GNVS based upon Thunderbolt capability. > > > > As for why they return unrecognized UUID instead of just masking all the > > capabilities bits? It's the same net functional result. If the vendor provided > > RC code doesn't caused WCHK problems or functional problems it's hard to > > make a case for why it needs to be changed by the OEM. > > > > I think that Peter's patch is appropriate to message this is specifically > > what's going on. > > No, it may hide real (ie. non-intentional) bugs in _OSC, so it is not > appropriate. > > Debug-level messages really should not hurt anyone (and should never > show up in production anyway). > > We can slightly tone down the "_OSC failed (%s); disabling ASPM\n" > message in negotiate_os_control() in drivers/acpi/pci_root.c if you > think it's too strong and that's it. > > Thanks, > Rafael I think changing that would help communicate what's going on here and at least let the user know the result will be that the firmware is still controlling ASPM due to the _OSC failure. Something else that I think Andy recommended a while back was at that time try to evaluate NEXP and display its value and an associated message in debug logs when _OSC fails. Would you be amenable to a change like that?
On Wed, Jun 22, 2016 at 12:43 PM, <Mario_Limonciello@dell.com> wrote: >> -----Original Message----- >> From: rjwysocki@gmail.com [mailto:rjwysocki@gmail.com] On Behalf Of >> Rafael J. Wysocki >> Sent: Tuesday, June 21, 2016 5:51 PM >> To: Limonciello, Mario <Mario_Limonciello@Dell.com> >> Cc: Peter Jones <pjones@redhat.com>; Rafael J. Wysocki >> <rafael@kernel.org>; ACPI Devel Maling List <linux-acpi@vger.kernel.org>; >> Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Len Brown >> <lenb@kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net>; Andy Lutomirski >> <luto@amacapital.net> >> Subject: Re: [PATCH] ACPI: don't show an error when we're not in charge of >> PCIe hotplug. >> >> On Tue, Jun 21, 2016 at 8:01 PM, <Mario_Limonciello@dell.com> wrote: >> >> -----Original Message----- >> >> From: Peter Jones [mailto:pjones@redhat.com] >> >> Sent: Tuesday, June 21, 2016 10:19 AM >> >> To: Rafael J. Wysocki <rafael@kernel.org> >> >> Cc: ACPI Devel Maling List <linux-acpi@vger.kernel.org>; Limonciello, >> Mario >> >> <Mario_Limonciello@Dell.com>; Linux Kernel Mailing List <linux- >> >> kernel@vger.kernel.org>; Len Brown <lenb@kernel.org>; Rafael J . >> Wysocki >> >> <rjw@rjwysocki.net>; Andy Lutomirski <luto@amacapital.net> >> >> Subject: Re: [PATCH] ACPI: don't show an error when we're not in charge >> of >> >> PCIe hotplug. >> >> >> >> (Sorry for the slow response - it's deadline time over here.) >> >> >> >> On Thu, Jun 16, 2016 at 04:56:57PM +0200, Rafael J. Wysocki wrote: >> >> > On Thu, Jun 16, 2016 at 2:12 AM, Rafael J. Wysocki <rafael@kernel.org> >> >> wrote: >> >> > > On Thu, Jun 16, 2016 at 12:15 AM, Peter Jones <pjones@redhat.com> >> >> wrote: >> >> > >> Right now when booting, on many laptops the firmware manages >> the >> >> PCIe >> >> > >> bus. As a result, when we call the _OSC ACPI method, it returns an >> >> > >> error code. Unfortunately the errors are not very articulate. >> >> > > >> >> > > What exactly do you mean here? >> >> > > >> >> > >> As a result, we show: >> >> > >> >> >> > >> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) >> >> > >> acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM >> ClockPM >> >> Segments MSI] >> >> > >> \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid >> >> UUID >> >> > >> _OSC request data: 1 1f 0 >> >> > > >> >> > > So _OSC told us that the UUID was invalid, didn't it? >> >> > >> >> > BTW, the above messages are KERN_DEBUG, so at least in theory they >> >> > shouldn't be visible in production runs. >> >> > >> >> > Maybe the bug to fix is that they show up when they aren't supposed >> to? >> >> >> >> No - the workflow that I am really trying to remedy is this: >> >> >> >> 1) S3 resume sometimes isn't working on some laptop you've got. >> >> 2) start looking at debug messages >> >> 3) this shows an error, so it looks like it's probably the problem >> >> 4) go fishing for red herring >> >> 5) if you happen to know who maintains the DSDT for the platform in >> >> question, eventually work out that this is working as intended and >> >> the bug is someplace else. >> >> 5b) if you don't know that person, eventually work out that it /might/ >> >> be someplace else... >> >> >> >> So the idea was to make it look more like an indication of status, and >> >> less like an error that's causing unrelated problems. >> >> >> >> When I talked to Mario at Dell (Cc'd), it wasn't clear to us that >> >> there's a way to distinguish the between the UUID being >> >> invalid/malformed, being merely unsupported, or being supported in >> some >> >> configurations but not the current one. In this particular DSDT, the >> >> machine doesn't support the OS controlling any of this if USB-C / >> >> thunderbolt are enabled. The DSDT is clearly written with the belief >> >> that you have to completely disable the handling for that UUID in this >> >> case, and googling for this looks like it's not the only one written >> >> with that belief. >> >> >> >> Reading the spec (v6.1, sections 6.2.11.3 and 6.2.11.4), it seems >> >> plausible that you can express this instead by handling the UUID but >> >> choosing each individual query/status bit in the way that accomplishes >> >> the OS doing nothing with the response. So it may well be that that's >> >> just more code that vendors have thought wasn't necessary (or wasn't >> >> correct for some reason.) >> >> >> >> Mario, want to jump in on your thinking here? >> >> >> >> -- >> >> Peter >> > >> > After talking to the team, I was told this particular implementation to not >> let >> > OS take control when acting on that specific UUID based upon a variable >> > (NEXP in this case) came from Intel RC code. >> > >> > That's probably why this is all across a lot of platforms, including non-Dell. >> > >> > At least in the context of the laptop Peter noticed this on (Dell XPS 13 9350) >> > NEXP is set in GNVS based upon Thunderbolt capability. >> > >> > As for why they return unrecognized UUID instead of just masking all the >> > capabilities bits? It's the same net functional result. If the vendor provided >> > RC code doesn't caused WCHK problems or functional problems it's hard to >> > make a case for why it needs to be changed by the OEM. >> > >> > I think that Peter's patch is appropriate to message this is specifically >> > what's going on. >> >> No, it may hide real (ie. non-intentional) bugs in _OSC, so it is not >> appropriate. >> >> Debug-level messages really should not hurt anyone (and should never >> show up in production anyway). >> >> We can slightly tone down the "_OSC failed (%s); disabling ASPM\n" >> message in negotiate_os_control() in drivers/acpi/pci_root.c if you >> think it's too strong and that's it. >> >> Thanks, >> Rafael > > I think changing that would help communicate what's going on here and at > least let the user know the result will be that the firmware is still controlling > ASPM due to the _OSC failure. > > Something else that I think Andy recommended a while back was at that > time try to evaluate NEXP and display its value and an associated message > in debug logs when _OSC fails. Would you be amenable to a change like that? That seems dangerous if NEXP is anything other than a SystemMemory variable. I don't know if there's a clean way to check that before evaluating it. (i.e. we don't want to hit some other thing called NEXP that has side effects.)
On Wed, Jun 22, 2016 at 10:53 PM, Andy Lutomirski <luto@amacapital.net> wrote: > On Wed, Jun 22, 2016 at 12:43 PM, <Mario_Limonciello@dell.com> wrote: >>> -----Original Message----- >>> From: rjwysocki@gmail.com [mailto:rjwysocki@gmail.com] On Behalf Of [cut] >> I think changing that would help communicate what's going on here and at >> least let the user know the result will be that the firmware is still controlling >> ASPM due to the _OSC failure. You seem to be assuming that all systems returning "unsupported UUID" from the PCI host bridge _OSC will always fall into the same category, but what if they don't? What if at least some of them are really broken? >> Something else that I think Andy recommended a while back was at that >> time try to evaluate NEXP and display its value and an associated message >> in debug logs when _OSC fails. Would you be amenable to a change like that? > > That seems dangerous if NEXP is anything other than a SystemMemory > variable. I don't know if there's a clean way to check that before > evaluating it. (i.e. we don't want to hit some other thing called > NEXP that has side effects.) Well, that's generic code and NEXP is not generic really, so agreed. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: rjwysocki@gmail.com [mailto:rjwysocki@gmail.com] On Behalf Of > Rafael J. Wysocki > Sent: Wednesday, June 22, 2016 5:48 PM > To: Andy Lutomirski <luto@amacapital.net>; Limonciello, Mario > <Mario_Limonciello@Dell.com> > Cc: Rafael Wysocki <rafael@kernel.org>; Peter Jones <pjones@redhat.com>; > Linux ACPI <linux-acpi@vger.kernel.org>; linux-kernel@vger.kernel.org; Len > Brown <lenb@kernel.org>; Rafael J. Wysocki <rjw@rjwysocki.net> > Subject: Re: [PATCH] ACPI: don't show an error when we're not in charge of > PCIe hotplug. > > On Wed, Jun 22, 2016 at 10:53 PM, Andy Lutomirski <luto@amacapital.net> > wrote: > > On Wed, Jun 22, 2016 at 12:43 PM, <Mario_Limonciello@dell.com> wrote: > >>> -----Original Message----- > >>> From: rjwysocki@gmail.com [mailto:rjwysocki@gmail.com] On Behalf Of > > [cut] > > >> I think changing that would help communicate what's going on here and at > >> least let the user know the result will be that the firmware is still > controlling > >> ASPM due to the _OSC failure. > > You seem to be assuming that all systems returning "unsupported UUID" > from the PCI host bridge _OSC will always fall into the same category, > but what if they don't? What if at least some of them are really > broken? > Even if they are broken, the net result will be the firmware is in control of ASPM, won't it? At least outside of the group on this mailing list that might not be very apparent from the current set of errors output into logs. > >> Something else that I think Andy recommended a while back was at that > >> time try to evaluate NEXP and display its value and an associated message > >> in debug logs when _OSC fails. Would you be amenable to a change like > that? > > > > That seems dangerous if NEXP is anything other than a SystemMemory > > variable. I don't know if there's a clean way to check that before > > evaluating it. (i.e. we don't want to hit some other thing called > > NEXP that has side effects.) > > Well, that's generic code and NEXP is not generic really, so agreed. OK.
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index 262ca31..be8a91b 100644 --- a/drivers/acpi/bus.c +++ b/drivers/acpi/bus.c @@ -215,6 +215,8 @@ acpi_status acpi_str_to_uuid(char *str, u8 *uuid) } EXPORT_SYMBOL_GPL(acpi_str_to_uuid); +static u8 pci_osc_uuid_str[] = "33DB4D5B-1FF7-401C-9657-7441C03DD766"; + acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context) { acpi_status status; @@ -267,9 +269,13 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context) if (errors & OSC_REQUEST_ERROR) acpi_print_osc_error(handle, context, "_OSC request failed"); - if (errors & OSC_INVALID_UUID_ERROR) - acpi_print_osc_error(handle, context, - "_OSC invalid UUID"); + if (errors & OSC_INVALID_UUID_ERROR) { + if (!strcasecmp(context->uuid_str, pci_osc_uuid_str)) + pr_info("PCI PME managed by ACPI.\n"); + else + acpi_print_osc_error(handle, context, + "_OSC invalid UUID"); + } if (errors & OSC_INVALID_REVISION_ERROR) acpi_print_osc_error(handle, context, "_OSC invalid revision");