mbox series

[0/2] PCI/AER: Consistently use _OSC to determine who owns AER

Message ID 20181115231605.24352-1-mr.nuke.me@gmail.com (mailing list archive)
Headers show
Series PCI/AER: Consistently use _OSC to determine who owns AER | expand

Message

Alex G. Nov. 15, 2018, 11:16 p.m. UTC
Thanks to Keith for pointing out that it doesn't make sense to disable
AER services when only one device has a FIRMWARE_FIRST HEST.

AER ownership is an interesting issue brought in by FFS (firmware-first)
model. In a nutshell if FFS handles AER, then OS should not touch any
of the AER bits. FW might set things up so that it receives AER
notifications via SMI. It's theoretically possible to receive SCIs,
but the exact mechanism is platform-dependent. OS touching AER bits
when firmware owns them may interfere with these notifications.

The ACPI mechanism for negotiating control of AER is _OSC, and is
described in detail in ACPI 6.2 Ch. 6.2.11.3. _OSC is negotiated at
the root bus level. Any root port, switch, or endpoint under the bus
would have its AER ownership negotiated in one _OSC call.

Then there is HEST, which is part of ACPI Platform Error Interfaces
(APEI). HEST tables describe the errors that FW may report to the OS.
A types 6,7 and 7 HEST tables describe AER errors from PCIe devices.
As part of this description, we're told if the error source is FFS.

Information in HEST seems to be redundant, as each error reported by
FW will have a CPER table that describes it in detail.

Because HEST describes an error source as firmware-first or not, we've
taken this to mean ownership of AER. Because AER ownership and error
reporting are coupled, _OSC and HEST usually agree on the matter of
ownership. However, that doesn't seem to be required by ACPI.

I've asked around a few people at Dell and they unanimously agree that
_OSC is the correct way to determine ownership of AER. In linux, we
use the result of _OSC to enable AER services, but we use HEST to
determine AER ownership. That's inconsistent. This series drops the
use of HEST in favor of _OSC.

[1] https://lkml.org/lkml/2018/11/15/62

Alexandru Gagniuc (2):
  PCI/AER: Do not use APEI/HEST to disable AER services globally
  PCI/AER: Determine AER ownership based on _OSC instead of HEST

 drivers/acpi/pci_root.c  |  9 +----
 drivers/pci/pcie/aer.c   | 82 ++--------------------------------------
 include/linux/pci-acpi.h |  6 ---
 3 files changed, 5 insertions(+), 92 deletions(-)

Comments

Sinan Kaya Nov. 16, 2018, 1:49 a.m. UTC | #1
On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
> I've asked around a few people at Dell and they unanimously agree that
> _OSC is the correct way to determine ownership of AER. In linux, we
> use the result of _OSC to enable AER services, but we use HEST to
> determine AER ownership. That's inconsistent. This series drops the
> use of HEST in favor of _OSC.
> 
> [1]https://lkml.org/lkml/2018/11/15/62

This change breaks the existing systems that rely on the HEST table
telling the operating system about firmware first presence.

Besides, HEST table has much more granularity about which PCI component
needs firmware such as global/device/switch.

You should probably circulate these ideas for wider consumption in UEFI
forum as UEFI owns the HEST table definition.
David Laight Nov. 16, 2018, 12:37 p.m. UTC | #2
From: Alexandru Gagniuc
> Sent: 15 November 2018 23:16
...
> I've asked around a few people at Dell and they unanimously agree that
> _OSC is the correct way to determine ownership of AER.

This is all very well, but we have systems (they might be Dell ones)
where failure of a PCIe link (even when all the drivers are removed)
causes an NMI - and crashes the kernel.

There are other systems which have AER registers available for
some of the PCIe devices, but the BIOS ignores them and _OSC
doesn't let the kernel take control.
In this case the AER registers can contain useful information
after a read returns ~0u.
It would be useful to be able to get this information without
having to grovel through all the PCIe config space.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Tyler Baicar Nov. 19, 2018, 4:53 p.m. UTC | #3
On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya <okaya@kernel.org> wrote:
>
> On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
> > I've asked around a few people at Dell and they unanimously agree that
> > _OSC is the correct way to determine ownership of AER. In linux, we
> > use the result of _OSC to enable AER services, but we use HEST to
> > determine AER ownership. That's inconsistent. This series drops the
> > use of HEST in favor of _OSC.
> >
> > [1]https://lkml.org/lkml/2018/11/15/62
>
> This change breaks the existing systems that rely on the HEST table
> telling the operating system about firmware first presence.
>
> Besides, HEST table has much more granularity about which PCI component
> needs firmware such as global/device/switch.
>
> You should probably circulate these ideas for wider consumption in UEFI
> forum as UEFI owns the HEST table definition.

I agree with Sinan, this will break existing systems, and the granularity of the
HEST definition is more useful than the single bit in _OSC.
Keith Busch Nov. 19, 2018, 4:53 p.m. UTC | #4
On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote:
> On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya <okaya@kernel.org> wrote:
> >
> > On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
> > > I've asked around a few people at Dell and they unanimously agree that
> > > _OSC is the correct way to determine ownership of AER. In linux, we
> > > use the result of _OSC to enable AER services, but we use HEST to
> > > determine AER ownership. That's inconsistent. This series drops the
> > > use of HEST in favor of _OSC.
> > >
> > > [1]https://lkml.org/lkml/2018/11/15/62
> >
> > This change breaks the existing systems that rely on the HEST table
> > telling the operating system about firmware first presence.
> >
> > Besides, HEST table has much more granularity about which PCI component
> > needs firmware such as global/device/switch.
> >
> > You should probably circulate these ideas for wider consumption in UEFI
> > forum as UEFI owns the HEST table definition.
> 
> I agree with Sinan, this will break existing systems, and the granularity of the
> HEST definition is more useful than the single bit in _OSC.

But we're not using HEST as a fine grain control. We disable native AER
handling if *any* device has FF set in HEST, and that just forces people
to use pcie_ports=native to get around that.
Sinan Kaya Nov. 19, 2018, 5:32 p.m. UTC | #5
On 11/19/2018 11:53 AM, Keith Busch wrote:
> On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote:
>> On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya <okaya@kernel.org> wrote:
>>>
>>> On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
>>>> I've asked around a few people at Dell and they unanimously agree that
>>>> _OSC is the correct way to determine ownership of AER. In linux, we
>>>> use the result of _OSC to enable AER services, but we use HEST to
>>>> determine AER ownership. That's inconsistent. This series drops the
>>>> use of HEST in favor of _OSC.
>>>>
>>>> [1]https://lkml.org/lkml/2018/11/15/62
>>>
>>> This change breaks the existing systems that rely on the HEST table
>>> telling the operating system about firmware first presence.
>>>
>>> Besides, HEST table has much more granularity about which PCI component
>>> needs firmware such as global/device/switch.
>>>
>>> You should probably circulate these ideas for wider consumption in UEFI
>>> forum as UEFI owns the HEST table definition.
>>
>> I agree with Sinan, this will break existing systems, and the granularity of the
>> HEST definition is more useful than the single bit in _OSC.
> 
> But we're not using HEST as a fine grain control. We disable native AER
> handling if *any* device has FF set in HEST, and that just forces people
> to use pcie_ports=native to get around that.
> 

I don't see *any* in the code.  aer_hest_parse() does the HEST table parsing.
It switches to firmware first mode if global flag in HEST is set. Otherwise
for each BDF in device, hest_match_pci() is used to do a cross-matching against
HEST table contents.

Am I missing something?
Keith Busch Nov. 19, 2018, 5:36 p.m. UTC | #6
On Mon, Nov 19, 2018 at 12:32:42PM -0500, Sinan Kaya wrote:
> On 11/19/2018 11:53 AM, Keith Busch wrote:
> > On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote:
> > > On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya <okaya@kernel.org> wrote:
> > > > 
> > > > On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
> > > > > I've asked around a few people at Dell and they unanimously agree that
> > > > > _OSC is the correct way to determine ownership of AER. In linux, we
> > > > > use the result of _OSC to enable AER services, but we use HEST to
> > > > > determine AER ownership. That's inconsistent. This series drops the
> > > > > use of HEST in favor of _OSC.
> > > > > 
> > > > > [1]https://lkml.org/lkml/2018/11/15/62
> > > > 
> > > > This change breaks the existing systems that rely on the HEST table
> > > > telling the operating system about firmware first presence.
> > > > 
> > > > Besides, HEST table has much more granularity about which PCI component
> > > > needs firmware such as global/device/switch.
> > > > 
> > > > You should probably circulate these ideas for wider consumption in UEFI
> > > > forum as UEFI owns the HEST table definition.
> > > 
> > > I agree with Sinan, this will break existing systems, and the granularity of the
> > > HEST definition is more useful than the single bit in _OSC.
> > 
> > But we're not using HEST as a fine grain control. We disable native AER
> > handling if *any* device has FF set in HEST, and that just forces people
> > to use pcie_ports=native to get around that.
> > 
> 
> I don't see *any* in the code.  aer_hest_parse() does the HEST table parsing.
> It switches to firmware first mode if global flag in HEST is set. Otherwise
> for each BDF in device, hest_match_pci() is used to do a cross-matching against
> HEST table contents.
> 
> Am I missing something?

You might be. :)

static int aer_hest_parse(struct acpi_hest_header *hest_hdr, void *data)
{
<snip>
        /*
         * If no specific device is supplied, determine whether
         * FIRMWARE_FIRST is set for *any* PCIe device.
         */
        if (!info->pci_dev) {
                info->firmware_first |= ff;
                return 0;
        }
Keith Busch Nov. 19, 2018, 5:41 p.m. UTC | #7
On Mon, Nov 19, 2018 at 12:42:25PM -0500, Sinan Kaya wrote:
> On 11/19/2018 12:32 PM, Sinan Kaya wrote:
> > > 
> > > But we're not using HEST as a fine grain control. We disable native AER
> > > handling if *any* device has FF set in HEST, and that just forces people
> > > to use pcie_ports=native to get around that.
> > > 
> > 
> > I don't see *any* in the code.  aer_hest_parse() does the HEST table parsing.
> > It switches to firmware first mode if global flag in HEST is set. Otherwise
> > for each BDF in device, hest_match_pci() is used to do a cross-matching against
> > HEST table contents.
> > 
> > Am I missing something?
> 
> I see. I think you are talking about aer_firmware_first, right?
> 
> aer_set_firmware_first() and  pcie_aer_get_firmware_first() seem to do the right
> thing.

Right, but what difference does it make if device specific AER checks do
the right thing if pcie_aer_init() doesn't even register it's port driver?
 
> aer_firmware_first is probably getting set because events are all routed to a
> single root port and aer_acpi_firmware_first() is used to decide whether AER
> should be initialized or not.
> 
> I think I understand what is going on now.
> 
> Still, breaking existing systems that rely on HEST table is not cool.
> I'd rather have users specify "pcie_ports=native" to skip FF rather than
> having broken systems by default to be honest.

The pcie_ports=native work-around ignores FF to potentially unknown
results, though.
Sinan Kaya Nov. 19, 2018, 5:42 p.m. UTC | #8
On 11/19/2018 12:32 PM, Sinan Kaya wrote:
>>
>> But we're not using HEST as a fine grain control. We disable native AER
>> handling if *any* device has FF set in HEST, and that just forces people
>> to use pcie_ports=native to get around that.
>>
> 
> I don't see *any* in the code.  aer_hest_parse() does the HEST table parsing.
> It switches to firmware first mode if global flag in HEST is set. Otherwise
> for each BDF in device, hest_match_pci() is used to do a cross-matching against
> HEST table contents.
> 
> Am I missing something?

I see. I think you are talking about aer_firmware_first, right?

aer_set_firmware_first() and  pcie_aer_get_firmware_first() seem to do the right
thing.

aer_firmware_first is probably getting set because events are all routed to a
single root port and aer_acpi_firmware_first() is used to decide whether AER
should be initialized or not.

I think I understand what is going on now.

Still, breaking existing systems that rely on HEST table is not cool.
I'd rather have users specify "pcie_ports=native" to skip FF rather than
having broken systems by default to be honest.
Sinan Kaya Nov. 19, 2018, 5:56 p.m. UTC | #9
On 11/19/2018 12:41 PM, Keith Busch wrote:
>> Still, breaking existing systems that rely on HEST table is not cool.
>> I'd rather have users specify "pcie_ports=native" to skip FF rather than
>> having broken systems by default to be honest.
> The pcie_ports=native work-around ignores FF to potentially unknown
> results, though.
> 

How about being able to enable/disable FF in BIOS?

We can't really turn off firmware first in the kernel without asking help
from the firmware. Like you said, it causes unpredictable results.

There will be two competing software trying to touch the same registers.
Keith Busch Nov. 19, 2018, 6:10 p.m. UTC | #10
On Mon, Nov 19, 2018 at 12:56:56PM -0500, Sinan Kaya wrote:
> On 11/19/2018 12:41 PM, Keith Busch wrote:
> > > Still, breaking existing systems that rely on HEST table is not cool.
> > > I'd rather have users specify "pcie_ports=native" to skip FF rather than
> > > having broken systems by default to be honest.
> > The pcie_ports=native work-around ignores FF to potentially unknown
> > results, though.
> > 
> 
> How about being able to enable/disable FF in BIOS?
>
> We can't really turn off firmware first in the kernel without asking help
> from the firmware.

The _OSC method this patch utilizes is the ACPI spec defined way for
the kernel to wrest control from firmware. BIOS specific menu settings
shouldn't be our only recourse when we have a spec authority defining
generic OS interfaces to accomplish the same thing.

Unless there is a disagreement on the _OSC interpreation, we'd have to
accept that platforms breaking from this patch are non-compliant.

> Like you said, it causes unpredictable results.
>
> There will be two competing software trying to touch the same registers.
Sinan Kaya Nov. 19, 2018, 6:24 p.m. UTC | #11
On 11/19/2018 1:10 PM, Keith Busch wrote:
>> We can't really turn off firmware first in the kernel without asking help
>> from the firmware.
> The _OSC method this patch utilizes is the ACPI spec defined way for
> the kernel to wrest control from firmware. BIOS specific menu settings
> shouldn't be our only recourse when we have a spec authority defining
> generic OS interfaces to accomplish the same thing.
> 
> Unless there is a disagreement on the _OSC interpreation, we'd have to
> accept that platforms breaking from this patch are non-compliant.
> 

It depends on which spec you look :)

UEFI HEST table specification also claims that it should be the ultimate
table for when PCI firmware-first should be disabled/enabled.

I think somebody needs to fix these. I saw an email from Harb Abdulhamid
sent to aswg this morning.

That's why I suggested circulating this idea in UEFI forums first.
Let's see what everybody thinks. We can go from there.
Alex G. Nov. 19, 2018, 7:11 p.m. UTC | #12
On 11/19/2018 12:24 PM, Sinan Kaya wrote:
> On 11/19/2018 1:10 PM, Keith Busch wrote:
>>> We can't really turn off firmware first in the kernel without asking 
>>> help
>>> from the firmware.
>> The _OSC method this patch utilizes is the ACPI spec defined way for
>> the kernel to wrest control from firmware. BIOS specific menu settings
>> shouldn't be our only recourse when we have a spec authority defining
>> generic OS interfaces to accomplish the same thing.
>>
>> Unless there is a disagreement on the _OSC interpreation, we'd have to
>> accept that platforms breaking from this patch are non-compliant.
>>
> 
> It depends on which spec you look :)
> 
> UEFI HEST table specification also claims that it should be the ultimate
> table for when PCI firmware-first should be disabled/enabled.

IIRC, EFI absorbed ACPI before FFS was a thing. Could you point me to 
the UEFI chapter that says HEST is authoritative?
(not being a smartie, just that my free software PDF readers can't 
search within a file that large)


> I think somebody needs to fix these. I saw an email from Harb Abdulhamid
> sent to aswg this morning.
> 
> That's why I suggested circulating this idea in UEFI forums first.
> Let's see what everybody thinks. We can go from there.

However you look at it, we have a glaring inconsistency in how we handle 
AER control in linux. I'm surprised we didn't see huge issues because of 
mixing HEST/_OSC.

What systems rely on the HEST definition as opposed to _OSC? It doesn't 
make sense to me that you could have a system with mixed FFS and native 
AER on the same root port. The granularity of HEST shouldn't matter here 
because of how AER works.

I'd like see how exactly we break one of those elusive systems with 
_OSC. I suspect _OSC and HEST end up having the same information, and 
that's why we didn't see any real-life issue with mixing the approaches.

Alex


P.S.
(SARCASM ALERT) Isn't UEFI is a pile of stuff that didn't stick to the wall?

P.P.S
I remember someone asking why we don't disable FFS in the BIOS. I think 
it would be next to impossible to get certain platform vendors to 
relinquish FFS control, unless the specs required things to be that way 
-- and had a "standard" way to do so.

Then getting the specs to change in that direction is also a battle.
Sinan Kaya Nov. 19, 2018, 7:32 p.m. UTC | #13
>> UEFI HEST table specification also claims that it should be the ultimate
>> table for when PCI firmware-first should be disabled/enabled.
> 
> IIRC, EFI absorbed ACPI before FFS was a thing. Could you point me to the UEFI 
> chapter that says HEST is authoritative?
> (not being a smartie, just that my free software PDF readers can't search within 
> a file that large)
> 

ACPI 6.2:

18.3.2.4 PCI Express Root Port AER Structure

Flags:

Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system 
firmware will handle errors from this source first.
Bit [1] - GLOBAL: If set, indicates that the settings contained in this 
structure apply globally to all PCI Express Devices.
All other bits must be set to zero.

It doesn't say shall, may or might. It says will.

> 
>> I think somebody needs to fix these. I saw an email from Harb Abdulhamid
>> sent to aswg this morning.
>>
>> That's why I suggested circulating this idea in UEFI forums first.
>> Let's see what everybody thinks. We can go from there.
> 
> However you look at it, we have a glaring inconsistency in how we handle AER 
> control in linux. I'm surprised we didn't see huge issues because of mixing 
> HEST/_OSC.
> 
> What systems rely on the HEST definition as opposed to _OSC? It doesn't make 
> sense to me that you could have a system with mixed FFS and native AER on the 
> same root port. The granularity of HEST shouldn't matter here because of how AER 
> works.

I think It depends on your PCI topology.

For other topologies with multiple PCI root complexes, I can see this being
used per root complex flag to indicate which root complex needs firmware first
and which one doesn't.

> 
> I'd like see how exactly we break one of those elusive systems with _OSC. I 
> suspect _OSC and HEST end up having the same information, and that's why we 
> didn't see any real-life issue with mixing the approaches.

I'm already aware of two systems that rely on HEST table to pass information to
the OS that firmware first is enabled. Both of the systems do not change their
_OSC bits during this assuming HEST table has priority over _OSC for firmware
first.

If we add this patch, OS will try to claim the AER address space while firmware
wants exclusive access.

As I said in my previous email, the right place to talk about this is UEFI
forum.

> 
> Alex
> 
> 
> P.S.
> (SARCASM ALERT) Isn't UEFI is a pile of stuff that didn't stick to the wall?
> 
> P.P.S
> I remember someone asking why we don't disable FFS in the BIOS. I think it would 
> be next to impossible to get certain platform vendors to relinquish FFS control, 
> unless the specs required things to be that way -- and had a "standard" way to 
> do so.
> 
> Then getting the specs to change in that direction is also a battle.
>
Alex_Gagniuc@Dellteam.com Nov. 19, 2018, 8:16 p.m. UTC | #14
On 11/19/2018 01:32 PM, Sinan Kaya wrote:
> ACPI 6.2:
> 
> 18.3.2.4 PCI Express Root Port AER Structure
> 
> Flags:
> 
> Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system
> firmware will handle errors from this source first.
> Bit [1] - GLOBAL: If set, indicates that the settings contained in this
> structure apply globally to all PCI Express Devices.
> All other bits must be set to zero.
> 
> It doesn't say shall, may or might. It says will.

It says "system firmware will handle errors". It does not say "system 
firmware owns AER registers". In absence on any descriptor text on the 
meaning of these tables, this really looks to me like it should be 
interpreted as a descriptor of APEI error sources, not a mutex on who 
writes to certain bits-- AER in this case.

I don't think that is contradictory or inconsistent.
I also wasn't able to find any reference to HEST in UEFI 2.7, only in 
ACPI spec.

> I think It depends on your PCI topology.
> 
> For other topologies with multiple PCI root complexes, I can see this being
> used per root complex flag to indicate which root complex needs firmware first
> and which one doesn't.

_OSC is per root bus, so it's already granular enough, right? Why would 
it depend on PCI topology?


>> I'd like see how exactly we break one of those elusive systems with _OSC. I
>> suspect _OSC and HEST end up having the same information, and that's why we
>> didn't see any real-life issue with mixing the approaches.
> 
> I'm already aware of two systems that rely on HEST table to pass information to
> the OS that firmware first is enabled. Both of the systems do not change their
> _OSC bits during this assuming HEST table has priority over _OSC for firmware
> first.

Are those hax86 systems?
It seems like the systems have broken firmware. I see several ways to 
handle broken systems like those:
  - Parse both HEST and _OSC, and decide AER ownership with root bridge 
granularity. i.e. host_bridge->native_aer is authoritative, but is 
derived from both HEST and _OSC
  - Add quirks for the broken systems
  - Keep doing what we're doing until current code breaks a new system

> If we add this patch, OS will try to claim the AER address space while firmware
> wants exclusive access.

Yay! FFS wants exclusive access, but does not claim it. Oh, FFS!


> As I said in my previous email, the right place to talk about this is UEFI
> forum.

The way I would present the problem to he spec writers is that, although 
the spec appears to be consistent, we've seen firmware vendors that made 
the wrong assumptions about HEST/_OSC. Instead of describing AER 
ownership with _OSC, they attempted to do it with HEST. So we should add 
an implementation note, or clarification about this.

Alex
Sinan Kaya Nov. 19, 2018, 8:33 p.m. UTC | #15
On 11/19/2018 3:16 PM, Alex_Gagniuc@Dellteam.com wrote:
> On 11/19/2018 01:32 PM, Sinan Kaya wrote:
>> ACPI 6.2:
>>
>> 18.3.2.4 PCI Express Root Port AER Structure
>>
>> Flags:
>>
>> Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system
>> firmware will handle errors from this source first.
>> Bit [1] - GLOBAL: If set, indicates that the settings contained in this
>> structure apply globally to all PCI Express Devices.
>> All other bits must be set to zero.
>>
>> It doesn't say shall, may or might. It says will.
> 
> It says "system firmware will handle errors". It does not say "system
> firmware owns AER registers". In absence on any descriptor text on the
> meaning of these tables, this really looks to me like it should be
> interpreted as a descriptor of APEI error sources, not a mutex on who
> writes to certain bits-- AER in this case.

True. I was trying to get it out in a rush. I omitted words.

However; table assumes governance about for which entities firmware first
should be enabled. There is no cross reference to _OSC or permission
negotiation like _OST.

> 
> I don't think that is contradictory or inconsistent.
> I also wasn't able to find any reference to HEST in UEFI 2.7, only in
> ACPI spec.

You are right. It was a confusion on my side. The right place to look is
ACPI specification. I was involved in this a couple of years ago. Some pieces
were in UEFI spec. Other pieces were in ACPI. I guess they got unified
now.

> 
>> I think It depends on your PCI topology.
>>
>> For other topologies with multiple PCI root complexes, I can see this being
>> used per root complex flag to indicate which root complex needs firmware first
>> and which one doesn't.
> 
> _OSC is per root bus, so it's already granular enough, right? Why would
> it depend on PCI topology?
> 

I was speculating. I don't have the full background on this. Need to consult
the spec developers.

>> As I said in my previous email, the right place to talk about this is UEFI
>> forum.
> 
> The way I would present the problem to he spec writers is that, although
> the spec appears to be consistent, we've seen firmware vendors that made
> the wrong assumptions about HEST/_OSC. Instead of describing AER
> ownership with _OSC, they attempted to do it with HEST. So we should add
> an implementation note, or clarification about this.

I agree.

> 
> Alex
>
Alex_Gagniuc@Dellteam.com Nov. 19, 2018, 11:49 p.m. UTC | #16
On 11/19/2018 02:33 PM, Sinan Kaya wrote:
> True. I was trying to get it out in a rush. I omitted words.

Sounds like you'd make an top notch spec writer! :p

> However; table assumes governance about for which entities firmware first
> should be enabled. There is no cross reference to _OSC or permission
> negotiation like _OST.

Well, from an OSPM perspective, is FFS something that can be enabled or 
disabled? FFS seems to be static to OSPM, which would change the sort of 
assumptions we can reasonably make here.


>>> As I said in my previous email, the right place to talk about this is UEFI
>>> forum.
>>
>> The way I would present the problem to he spec writers is that, although
>> the spec appears to be consistent, we've seen firmware vendors that made
>> the wrong assumptions about HEST/_OSC. Instead of describing AER
>> ownership with _OSC, they attempted to do it with HEST. So we should add
>> an implementation note, or clarification about this.
> 
> I agree.

Cool. While the UEFI Secret Society debates, can we figure out if/how 
[patch 1/2] breaks those systems, or is it only [patch 2/2] of this 
series that we suspect?

Alex
Sinan Kaya Nov. 20, 2018, 1:54 a.m. UTC | #17
On 11/19/2018 6:49 PM, Alex_Gagniuc@Dellteam.com wrote:
> On 11/19/2018 02:33 PM, Sinan Kaya wrote:
>> However; table assumes governance about for which entities firmware first
>> should be enabled. There is no cross reference to _OSC or permission
>> negotiation like _OST.
> 
> Well, from an OSPM perspective, is FFS something that can be enabled or
> disabled? FFS seems to be static to OSPM, which would change the sort of
> assumptions we can reasonably make here.

IMO, it can be enabled/disabled in BIOS. I have seen this implementation before.
If the trigger is the presence of a statically compiled ACPI HEST table (as the
current code does); presence of FFS would be static from OSPM perspective.
BIOS could patch this table or hide it during boot.

If FFS were to be negotiated via _OSC as indirectly implied in this series, then
same BIOS could patch the ACPI table to return different values for the _OSC
return.

> 
> 
>>>> As I said in my previous email, the right place to talk about this is UEFI
>>>> forum.
>>>
>>> The way I would present the problem to he spec writers is that, although
>>> the spec appears to be consistent, we've seen firmware vendors that made
>>> the wrong assumptions about HEST/_OSC. Instead of describing AER
>>> ownership with _OSC, they attempted to do it with HEST. So we should add
>>> an implementation note, or clarification about this.
>>
>> I agree.
> 
> Cool. While the UEFI Secret Society debates, can we figure out if/how
> [patch 1/2] breaks those systems, or is it only [patch 2/2] of this
> series that we suspect?

I went back and looked at both patches. Both of them are removing references to
HEST table. I think both patches are impacted by this discussion.

> 
> Alex
>
Alex_Gagniuc@Dellteam.com Nov. 20, 2018, 8:44 p.m. UTC | #18
On 11/19/2018 07:54 PM, Sinan Kaya wrote:
> On 11/19/2018 6:49 PM, Alex_Gagniuc@Dellteam.com wrote:
>> On 11/19/2018 02:33 PM, Sinan Kaya wrote:
>>> However; table assumes governance about for which entities firmware first
>>> should be enabled. There is no cross reference to _OSC or permission
>>> negotiation like _OST.
>>
>> Well, from an OSPM perspective, is FFS something that can be enabled or
>> disabled? FFS seems to be static to OSPM, which would change the sort of
>> assumptions we can reasonably make here.
> 
> IMO, it can be enabled/disabled in BIOS. I have seen this implementation before.
> If the trigger is the presence of a statically compiled ACPI HEST table (as the
> current code does); presence of FFS would be static from OSPM perspective.
> BIOS could patch this table or hide it during boot.
> 
> If FFS were to be negotiated via _OSC as indirectly implied in this series, then
> same BIOS could patch the ACPI table to return different values for the _OSC
> return.

It is theoretically possible to have proprietary BIOS settings to 
disable FFS. The platform vendors that I've spoken to do not offer this 
option. Though even if, hypothetically, BIOS clears the FFS bit in HEST, 
it won't stop it from commandeering the CPU and doing whatever it wants.

Although, I'm not quite sure why we'd want to negotiate FFS itself. FFS 
is too big of a can of worms (goes far beyond AER error reporting), when 
what we really care about is if OS can use a specific feature or not.
>> Cool. While the UEFI Secret Society debates, can we figure out if/how
>> [patch 1/2] breaks those systems, or is it only [patch 2/2] of this
>> series that we suspect?
> 
> I went back and looked at both patches. Both of them are removing references to
> HEST table. I think both patches are impacted by this discussion.

I'd prefer "sure" instead of "think". "I think it breaks some system I'm 
not telling you about" doesn't help much in figuring out how not to 
break said system(s). :)

Alex
Sinan Kaya Nov. 20, 2018, 9:02 p.m. UTC | #19
On 11/20/2018 3:44 PM, Alex_Gagniuc@Dellteam.com wrote:
> I'd prefer "sure" instead of "think". "I think it breaks some system I'm
> not telling you about" doesn't help much in figuring out how not to
> break said system(s).:)

Sorry, I thought I mentioned why it would break but let me repeat.

The systems I have seen rely on the HEST table presence as an indicator
to the OS that firmware first is enabled. If you go look at the _OSC bits
on such systems, it still says OS owns the AER service.

The assumption here is that HEST table has precedence over the _OSC bits.
That's what needs to be clarified in the UEFI forum.

If this code is to go in and ignore the HEST table presence, then firmware
will think that it owns AER service and OS will think that it owns AER
service too.
Keith Busch Nov. 20, 2018, 9:42 p.m. UTC | #20
On Tue, Nov 20, 2018 at 04:02:21PM -0500, Sinan Kaya wrote:
> On 11/20/2018 3:44 PM, Alex_Gagniuc@Dellteam.com wrote:
> > I'd prefer "sure" instead of "think". "I think it breaks some system I'm
> > not telling you about" doesn't help much in figuring out how not to
> > break said system(s).:)
> 
> Sorry, I thought I mentioned why it would break but let me repeat.
> 
> The systems I have seen rely on the HEST table presence as an indicator
> to the OS that firmware first is enabled. If you go look at the _OSC bits
> on such systems, it still says OS owns the AER service.
> 
> The assumption here is that HEST table has precedence over the _OSC bits.
> That's what needs to be clarified in the UEFI forum.
> 
> If this code is to go in and ignore the HEST table presence, then firmware
> will think that it owns AER service and OS will think that it owns AER
> service too.

How does that work? If the OS takes control, it sets up MSIs that FW don't
react to, and disables system errors through PCIe Root Control. Aren't
those sys errs the mechanism FW knows it has something to do, which
means the OS can effectively fence it off?
Alex_Gagniuc@Dellteam.com Nov. 20, 2018, 9:46 p.m. UTC | #21
On 11/20/2018 03:02 PM, Sinan Kaya wrote:
> On 11/20/2018 3:44 PM, Alex_Gagniuc@Dellteam.com wrote:
>> I'd prefer "sure" instead of "think". "I think it breaks some system I'm
>> not telling you about" doesn't help much in figuring out how not to
>> break said system(s).:)
> 
> Sorry, I thought I mentioned why it would break but let me repeat.

Why, yes, but bets are still being placed on the systems allegedly 
suffering from this.


> The systems I have seen rely on the HEST table presence as an indicator
> to the OS that firmware first is enabled. If you go look at the _OSC bits
> on such systems, it still says OS owns the AER service.
> 
> The assumption here is that HEST table has precedence over the _OSC bits.
> That's what needs to be clarified in the UEFI forum.
> 
> If this code is to go in and ignore the HEST table presence, then firmware
> will think that it owns AER service and OS will think that it owns AER
> service too.

So this seems like exactly the scenario we were hypothesizing.

  * System boots up with FFS enabled. Everything is fine so far.
  * OSPM requests control of AER (set bit 3 in _OSC)
  * FW grants OS control of AER (set bit 3 in _OSC reply)

That's how things are designed to work.


Now, let's assume, for the sake of argument, that the firmware on those 
system's is broken, and it didn't intend to give the OS control of AER. 
OSPM checking HEST instead of _OSC is still wrong, according to the 
spec. Two wrongs don't make a right, they just don't crash.

I think the correct way is to identify those broken systems, and add 
quirks for them. Continuing to have inconsistent and over-complicated 
logic that is not spec compliant is not any better.

Alex
Sinan Kaya Nov. 20, 2018, 10:08 p.m. UTC | #22
On 11/20/2018 4:46 PM, Alex_Gagniuc@Dellteam.com wrote:
> Now, let's assume, for the sake of argument, that the firmware on those
> system's is broken, and it didn't intend to give the OS control of AER.
> OSPM checking HEST instead of _OSC is still wrong, according to the
> spec. Two wrongs don't make a right, they just don't crash.
> 
> I think the correct way is to identify those broken systems, and add
> quirks for them. Continuing to have inconsistent and over-complicated
> logic that is not spec compliant is not any better.

Remember that both _OSC and HEST are in the ACPI specification. I don't
think there is a consensus on what is "wrong".

There is certainly a need for spec clarification.

One version is:

"if HEST table is present, ignore _OSC"

or

Another version is:

"if HEST table is present, make sure that FW sets _OSC bit for AER to
false. Otherwise, warn like crazy that this BIOS is broken and needs
an update and can cause all sorts of trouble"

I can see both points of view. The second one can also be worked around
by an SMBIOS quirk too as you suggested. Counting the number of quirks
and random bug reports will be an interesting exercise / regression.

I followed the ASWG thread yesterday. There will be a meeting next week to
discuss this.

My preference is not to introduce new behavior/regression to the kernel.
Sinan Kaya Nov. 20, 2018, 10:28 p.m. UTC | #23
On 11/20/2018 4:42 PM, Keith Busch wrote:
> How does that work? If the OS takes control, it sets up MSIs that FW don't
> react to, and disables system errors through PCIe Root Control. Aren't
> those sys errs the mechanism FW knows it has something to do, which
> means the OS can effectively fence it off?

I think this is all implementation detail and doesn't necessarily apply
to all firmware-first implementation flavors.

Assumptions are:
1. both FW and OS are listening to MSI interrupts
2. FW monitors the system errors

Some FF implementation could route the AER interrupt to a higher privilege
level. Some other implementation could use INTx or a side-band channel interrupt
for firmware-interrupt too.

I have seen all 3 except MSI :) and also firmware never monitored the system
error bits. I was curious if anybody ever used those legacy bits. Now, I know
someone is using it.
Alex G. Nov. 20, 2018, 10:35 p.m. UTC | #24
On 11/20/2018 04:28 PM, Sinan Kaya wrote:
> On 11/20/2018 4:42 PM, Keith Busch wrote:
>> How does that work? If the OS takes control, it sets up MSIs that FW 
>> don't
>> react to, and disables system errors through PCIe Root Control. Aren't
>> those sys errs the mechanism FW knows it has something to do, which
>> means the OS can effectively fence it off?
> 
> I think this is all implementation detail and doesn't necessarily apply
> to all firmware-first implementation flavors.
> 
> Assumptions are:
> 1. both FW and OS are listening to MSI interrupts

On hax86, I'm not sure FW can listen to MSI interrups. FW only exists in 
SMM, not ring 1-4.

> 2. FW monitors the system errors
> 
> Some FF implementation could route the AER interrupt to a higher privilege
> level. Some other implementation could use INTx or a side-band channel 
> interrupt
> for firmware-interrupt too.
> 
> I have seen all 3 except MSI :) and also firmware never monitored the 
> system
> error bits. I was curious if anybody ever used those legacy bits. Now, I 
> know
> someone is using it.
Alex_Gagniuc@Dellteam.com Nov. 20, 2018, 10:36 p.m. UTC | #25
On 11/20/2018 04:08 PM, Sinan Kaya wrote:
> One version is:
> "if HEST table is present, ignore _OSC"
> or
> Another version is:
> "if HEST table is present, make sure that FW sets _OSC bit for AER to
> false. Otherwise, warn like crazy that this BIOS is broken and needs
> an update and can cause all sorts of trouble"

ACPI 6.2 Ch 6.2.11.3, Table 6-197
	PCI Express Advanced Error Reporting control

	The firmware sets this bit to 1 to grant control over PCI
	Express Advanced Error Reporting.
	If firmware allows the OS control of this feature, then in the
	context of the _OSC method it must ensure that error messages
	are routed to device interrupts as described in the PCI
	Express Base Specification. Additionally, after control is
	transferred to the OS, firmware must not modify the Advanced
	Error Reporting Capability. If control of this feature was
	requested and denied or was not requested, firmware returns
	this bit set to 0.
Alex_Gagniuc@Dellteam.com Nov. 27, 2018, 6:22 p.m. UTC | #26
On 11/20/2018 04:08 PM, Sinan Kaya wrote:
> I followed the ASWG thread yesterday. There will be a meeting next week to
> discuss this.

Any updates on the meeting?
Sinan Kaya Nov. 27, 2018, 6:32 p.m. UTC | #27
On 11/27/2018 1:22 PM, Alex_Gagniuc@Dellteam.com wrote:
> On 11/20/2018 04:08 PM, Sinan Kaya wrote:
>> I followed the ASWG thread yesterday. There will be a meeting next week to
>> discuss this.
> 
> Any updates on the meeting?
> 
> 

Tyler should be able to get an update.
Tyler Baicar Nov. 27, 2018, 6:46 p.m. UTC | #28
On Tue, Nov 27, 2018 at 1:32 PM Sinan Kaya <okaya@kernel.org> wrote:
>
> On 11/27/2018 1:22 PM, Alex_Gagniuc@Dellteam.com wrote:
> > On 11/20/2018 04:08 PM, Sinan Kaya wrote:
> >> I followed the ASWG thread yesterday. There will be a meeting next week to
> >> discuss this.
> >
> > Any updates on the meeting?
> >
> >
>
> Tyler should be able to get an update.

As per Harb, the meetings are on Thursdays so it will be discussed this
Thursday since last week was a holiday.

Thanks,
Tyler
Bjorn Helgaas March 5, 2019, 11:16 p.m. UTC | #29
On Thu, Nov 15, 2018 at 05:16:01PM -0600, Alexandru Gagniuc wrote:
> Thanks to Keith for pointing out that it doesn't make sense to disable
> AER services when only one device has a FIRMWARE_FIRST HEST.
> 
> AER ownership is an interesting issue brought in by FFS (firmware-first)
> model. In a nutshell if FFS handles AER, then OS should not touch any
> of the AER bits. FW might set things up so that it receives AER
> notifications via SMI. It's theoretically possible to receive SCIs,
> but the exact mechanism is platform-dependent. OS touching AER bits
> when firmware owns them may interfere with these notifications.
> 
> The ACPI mechanism for negotiating control of AER is _OSC, and is
> described in detail in ACPI 6.2 Ch. 6.2.11.3. _OSC is negotiated at
> the root bus level. Any root port, switch, or endpoint under the bus
> would have its AER ownership negotiated in one _OSC call.
> 
> Then there is HEST, which is part of ACPI Platform Error Interfaces
> (APEI). HEST tables describe the errors that FW may report to the OS.
> A types 6,7 and 7 HEST tables describe AER errors from PCIe devices.
> As part of this description, we're told if the error source is FFS.
> 
> Information in HEST seems to be redundant, as each error reported by
> FW will have a CPER table that describes it in detail.
> 
> Because HEST describes an error source as firmware-first or not, we've
> taken this to mean ownership of AER. Because AER ownership and error
> reporting are coupled, _OSC and HEST usually agree on the matter of
> ownership. However, that doesn't seem to be required by ACPI.
> 
> I've asked around a few people at Dell and they unanimously agree that
> _OSC is the correct way to determine ownership of AER. In linux, we
> use the result of _OSC to enable AER services, but we use HEST to
> determine AER ownership. That's inconsistent. This series drops the
> use of HEST in favor of _OSC.
> 
> [1] https://lkml.org/lkml/2018/11/15/62
> 
> Alexandru Gagniuc (2):
>   PCI/AER: Do not use APEI/HEST to disable AER services globally
>   PCI/AER: Determine AER ownership based on _OSC instead of HEST
> 
>  drivers/acpi/pci_root.c  |  9 +----
>  drivers/pci/pcie/aer.c   | 82 ++--------------------------------------
>  include/linux/pci-acpi.h |  6 ---
>  3 files changed, 5 insertions(+), 92 deletions(-)

I'm pretty sure we do need to do something here, but there was quite a
lot of discussion that didn't seem to really get resolved, so I'm
dropping these for now.

Please repost them with any relevant updates and we'll see if we can
get a consensus that we're going the right direction.

Bjorn