Message ID | 20231127211729.42668-1-nirmal.patel@linux.intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [v2] PCI: vmd: Enable Hotplug based on BIOS setting on VMD rootports | expand |
On Mon, Nov 27, 2023 at 04:17:29PM -0500, Nirmal Patel wrote: > Currently VMD copies root bridge setting to enable Hotplug on its > rootports. This mechanism works fine for Host OS and no issue has > been observed. However in case of VM, all the HyperVisors don't > pass the Hotplug setting to the guest BIOS which results in > assigning default values and disabling Hotplug capability in the > guest which have been observed by many OEMs. Can we make this a little more specific? I'm lacking a lot of virtualization background here, so I'm going to ask a bunch of stupid questions here: "VMD copies root bridge setting" refers to _OSC and the copying done in vmd_copy_host_bridge_flags()? Obviously this must be in the Host OS. "This mechanism works fine for Host OS and no issue has been observed." I guess this means that if the platform grants hotplug control to the Host OS via _OSC, pciehp in the Host OS works fine to manage hotplug on Root Ports below the VMD? If the platform does *not* grant hotplug control to the Host OS, I guess it means the Host OS pciehp can *not* manage hotplug below VMD? Is that also what you expect? "However in case of VM, all the HyperVisors don't pass the Hotplug setting to the guest BIOS" What is the mechanism by which they would pass the hotplug setting? I guess the guest probably doesn't see the VMD endpoint itself, right? The guest sees either virtualized or passed-through VMD Root Ports? I assume the guest OS sees a constructed ACPI system (different from the ACPI environment the host sees), so it sees a PNP0A03 host bridge with _OSC? And presumably VMD Root Ports below that host bridge? So are you saying the _OSC seen by the guest doesn't grant hotplug control to the guest OS? And it doesn't do that because the guest BIOS hasn't implemented _OSC? Or maybe the guest BIOS relies on the Slot Capabilities registers of VMD Root Ports to decide whether _OSC should grant hotplug control? And those Slot Capabilities don't advertise hotplug support? "... which results in assigning default values and disabling Hotplug capability in the guest." What default values are these? Is this talking about the default host_bridge->native_pcie_hotplug value set in acpi_pci_root_create(), where the OS assumes hotplug is owned by the platform unless _OSC grants control to the OS? > VMD Hotplug can be enabled or disabled based on the VMD rootports' > Hotplug configuration in BIOS. is_hotplug_bridge is set on each > VMD rootport based on Hotplug capable bit in SltCap in probe.c. > Check is_hotplug_bridge and enable or disable native_pcie_hotplug > based on that value. > > This patch will make sure that Hotplug is enabled properly in Host > as well as in VM while honoring _OSC settings as well as VMD hotplug > setting. "Hotplug is enabled properly in Host as well as in VM" sounds like we're talking about both the host OS and the guest OS. In the host OS, this overrides whatever was negotiated via _OSC, I guess on the principle that _OSC doesn't apply because the host BIOS doesn't know about the Root Ports below the VMD. (I'm not sure it's 100% resolved that _OSC doesn't apply to them, so we should mention that assumption here.) In the guest OS, this still overrides whatever was negotiated via _OSC, but presumably the guest BIOS *does* know about the Root Ports, so I don't think the same principle about overriding _OSC applies there. I think it's still a problem that we handle host_bridge->native_pcie_hotplug differently than all the other features negotiated via _OSC. We should at least explain this somehow. I suspect part of the reason for doing it differently is to avoid the interrupt storm that we avoid via 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features"). If so, I think 04b12ef163d1 should be mentioned here because it's an important piece of the picture. > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com> > --- > v1->v2: Updating commit message. > --- > drivers/pci/controller/vmd.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > index 769eedeb8802..e39eaef5549a 100644 > --- a/drivers/pci/controller/vmd.c > +++ b/drivers/pci/controller/vmd.c > @@ -720,6 +720,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) > resource_size_t membar2_offset = 0x2000; > struct pci_bus *child; > struct pci_dev *dev; > + struct pci_host_bridge *vmd_bridge; > int ret; > > /* > @@ -886,8 +887,16 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) > * and will fail pcie_bus_configure_settings() early. It can instead be > * run on each of the real root ports. > */ > - list_for_each_entry(child, &vmd->bus->children, node) > + vmd_bridge = to_pci_host_bridge(vmd->bus->bridge); > + list_for_each_entry(child, &vmd->bus->children, node) { > pcie_bus_configure_settings(child); > + /* > + * When Hotplug is enabled on vmd root-port, enable it on vmd > + * bridge. > + */ > + if (child->self->is_hotplug_bridge) > + vmd_bridge->native_pcie_hotplug = 1; "is_hotplug_bridge" basically means PCI_EXP_SLTCAP_HPC is set, which means "the hardware of this slot is hot-plug *capable*." Whether hotplug is *enabled* is a separate policy decision about whether the OS is allowed to operate the hotplug functionality, so I think saying "When Hotplug is enabled" in the comment is a little bit misleading. Bjorn > + } > > pci_bus_add_devices(vmd->bus); > > -- > 2.31.1 >
On Fri, 2023-12-01 at 18:07 -0600, Bjorn Helgaas wrote: > On Mon, Nov 27, 2023 at 04:17:29PM -0500, Nirmal Patel wrote: > > Currently VMD copies root bridge setting to enable Hotplug on its > > rootports. This mechanism works fine for Host OS and no issue has > > been observed. However in case of VM, all the HyperVisors don't > > pass the Hotplug setting to the guest BIOS which results in > > assigning default values and disabling Hotplug capability in the > > guest which have been observed by many OEMs. > > Can we make this a little more specific? I'm lacking a lot of > virtualization background here, so I'm going to ask a bunch of stupid > questions here: > > "VMD copies root bridge setting" refers to _OSC and the copying done > in vmd_copy_host_bridge_flags()? Obviously this must be in the Host > OS. Yes. we are talking about vmd_copy_host_bridge_flags. It gets called in both Host and Guest. But it assigns different values. In Host, it assigns correct values, while in guest it assigns power-on default value 0 which in simple term disable everything including hotplug. > > "This mechanism works fine for Host OS and no issue has been > observed." I guess this means that if the platform grants hotplug > control to the Host OS via _OSC, pciehp in the Host OS works fine to > manage hotplug on Root Ports below the VMD? Yes. When platform hotplug setting is enabled, then _OSC assigns correct value to vmd root ports via vmd_copy_host_bridge_flags. > > If the platform does *not* grant hotplug control to the Host OS, I > guess it means the Host OS pciehp can *not* manage hotplug below VMD? > Is that also what you expect? > > "However in case of VM, all the HyperVisors don't pass the Hotplug > setting to the guest BIOS" What is the mechanism by which they would > pass the hotplug setting? I guess the guest probably doesn't see the > VMD endpoint itself, right? The guest sees either virtualized or > passed-through VMD Root Ports? In guest, vmd is passthrough including pci topology behind vmd. The way we verify Hotplug capability is to read Slot Capabilities registers' hotplug enabled bit set on vmd root ports in Guest. The values copied in vmd_copy_host_bridge_flags are different in Host than in Guest. I do not know what component is responsible for this in every HyperVisor. But I tested this in ESXi, HyperV, KVM and they all had the same behavior where the _OSC flags are set to power-on default values of 0 by vmd_copy_host_bridge_flags in Guest OS which is disabling hotplug. > > I assume the guest OS sees a constructed ACPI system (different from > the ACPI environment the host sees), so it sees a PNP0A03 host bridge > with _OSC? And presumably VMD Root Ports below that host bridge? > > So are you saying the _OSC seen by the guest doesn't grant hotplug > control to the guest OS? And it doesn't do that because the guest > BIOS hasn't implemented _OSC? Or maybe the guest BIOS relies on the > Slot Capabilities registers of VMD Root Ports to decide whether _OSC > should grant hotplug control? And those Slot Capabilities don't > advertise hotplug support? Currently Hotplug is controlled by _OSC in both host and Guest. In case of guest, it seems guest BIOS hasn't implemented _OSC since all the flags are set to 0 which is not the case in host. VMD is passthrough in Guest so the slot capabilities registers are still same as what Host driver would see. That is how we can still control hotplug on vmd on both Guest and Host. > > "... which results in assigning default values and disabling Hotplug > capability in the guest." What default values are these? 0. Everything is disabled in guest. So basically _OSC is writing wrong values in guest. > Is this > talking about the default host_bridge->native_pcie_hotplug value set > in acpi_pci_root_create(), where the OS assumes hotplug is owned by > the platform unless _OSC grants control to the OS? vmd driver calls pci_create_root_bus which eventually ends up calling pci_init_host_bridge where native_pcie_hotplug is set to 1. Now vmd_copy_host_bridge_flags overwrites the native_pcie_hotplug to _OSC setting of 0 in guest. > > > VMD Hotplug can be enabled or disabled based on the VMD rootports' > > Hotplug configuration in BIOS. is_hotplug_bridge is set on each > > VMD rootport based on Hotplug capable bit in SltCap in probe.c. > > Check is_hotplug_bridge and enable or disable native_pcie_hotplug > > based on that value. > > > > This patch will make sure that Hotplug is enabled properly in Host > > as well as in VM while honoring _OSC settings as well as VMD > > hotplug > > setting. > > "Hotplug is enabled properly in Host as well as in VM" sounds like > we're talking about both the host OS and the guest OS. Yes. VM means guest. > > In the host OS, this overrides whatever was negotiated via _OSC, I > guess on the principle that _OSC doesn't apply because the host BIOS > doesn't know about the Root Ports below the VMD. (I'm not sure it's > 100% resolved that _OSC doesn't apply to them, so we should mention > that assumption here.) _OSC still controls every flag including Hotplug. I have observed that slot capabilities register has hotplug enabled only when platform has enabled the hotplug. So technically not overriding it in the host. It overrides in guest because _OSC is passing wrong/different information than the _OSC information in Host. Also like I mentioned, slot capabilities registers are not changed in Guest because vmd is passthrough. > > In the guest OS, this still overrides whatever was negotiated via > _OSC, but presumably the guest BIOS *does* know about the Root Ports, > so I don't think the same principle about overriding _OSC applies > there. > > I think it's still a problem that we handle > host_bridge->native_pcie_hotplug differently than all the other > features negotiated via _OSC. We should at least explain this > somehow. Right now this is the only way to know in Guest OS if platform has enabled Hotplug or not. We have many customers complaining about Hotplug being broken in Guest. So just trying to honor _OSC but also fixing its limitation in Guest. > > I suspect part of the reason for doing it differently is to avoid the > interrupt storm that we avoid via 04b12ef163d1 ("PCI: vmd: Honor ACPI > _OSC on PCIe features"). If so, I think 04b12ef163d1 should be > mentioned here because it's an important piece of the picture. I can add a note about this patch as well. Let me know if you want me to add something specific. Thank you for taking the time. This is a very critical issue for us and we have many customers complaining about it, I would appreciate to get it resolved as soon as possible. Thanks nirmal > > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com> > > --- > > v1->v2: Updating commit message. > > --- > > drivers/pci/controller/vmd.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/controller/vmd.c > > b/drivers/pci/controller/vmd.c > > index 769eedeb8802..e39eaef5549a 100644 > > --- a/drivers/pci/controller/vmd.c > > +++ b/drivers/pci/controller/vmd.c > > @@ -720,6 +720,7 @@ static int vmd_enable_domain(struct vmd_dev > > *vmd, unsigned long features) > > resource_size_t membar2_offset = 0x2000; > > struct pci_bus *child; > > struct pci_dev *dev; > > + struct pci_host_bridge *vmd_bridge; > > int ret; > > > > /* > > @@ -886,8 +887,16 @@ static int vmd_enable_domain(struct vmd_dev > > *vmd, unsigned long features) > > * and will fail pcie_bus_configure_settings() early. It can > > instead be > > * run on each of the real root ports. > > */ > > - list_for_each_entry(child, &vmd->bus->children, node) > > + vmd_bridge = to_pci_host_bridge(vmd->bus->bridge); > > + list_for_each_entry(child, &vmd->bus->children, node) { > > pcie_bus_configure_settings(child); > > + /* > > + * When Hotplug is enabled on vmd root-port, enable it > > on vmd > > + * bridge. > > + */ > > + if (child->self->is_hotplug_bridge) > > + vmd_bridge->native_pcie_hotplug = 1; > > "is_hotplug_bridge" basically means PCI_EXP_SLTCAP_HPC is set, which > means "the hardware of this slot is hot-plug *capable*." > > Whether hotplug is *enabled* is a separate policy decision about > whether the OS is allowed to operate the hotplug functionality, so I > think saying "When Hotplug is enabled" in the comment is a little bit > misleading. I will rewrite the comment. > > Bjorn > > > + } > > > > pci_bus_add_devices(vmd->bus); > > > > -- > > 2.31.1 > >
On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote: > On Fri, 2023-12-01 at 18:07 -0600, Bjorn Helgaas wrote: > > On Mon, Nov 27, 2023 at 04:17:29PM -0500, Nirmal Patel wrote: > > > Currently VMD copies root bridge setting to enable Hotplug on its > > > rootports. This mechanism works fine for Host OS and no issue has > > > been observed. However in case of VM, all the HyperVisors don't > > > pass the Hotplug setting to the guest BIOS which results in > > > assigning default values and disabling Hotplug capability in the > > > guest which have been observed by many OEMs. > > > > Can we make this a little more specific? I'm lacking a lot of > > virtualization background here, so I'm going to ask a bunch of stupid > > questions here: > > > > "VMD copies root bridge setting" refers to _OSC and the copying done > > in vmd_copy_host_bridge_flags()? Obviously this must be in the Host > > OS. > > Yes. we are talking about vmd_copy_host_bridge_flags. It gets called in > both Host and Guest. But it assigns different values. In Host, it > assigns correct values, while in guest it assigns power-on default > value 0 which in simple term disable everything including hotplug. If vmd_copy_host_bridge_flags() is called both in the Host and the Guest, I guess that means both the VMD endpoint and the VMD Root Ports below it are passed through to the Guest? I expected only the VMD Root Ports would be passed through, but maybe that's my misunderstanding. The VMD endpoint is under host bridge A, and the VMD creates a new PCI domain under a new host bridge B. vmd_copy_host_bridge_flags() copies all the feature ownership flags from A to B. On ACPI systems (like both Host and Guest) all these flags are initially cleared for host bridge A in acpi_pci_root_create() because these features are owned by the platform until _OSC grants ownership to the OS. They are initially *set* for host bridge B in pci_init_host_bridge() because it's created by the vmd driver instead of being enumerated via ACPI. If the Host _OSC grants ownership to the OS, A's native_pcie_hotplug will be set, and vmd_copy_host_bridge_flags() leaves it set for B as well. If the Host _OSC does *not* grant hotplug ownership to the OS, native_pcie_hotplug will be cleared for both A and B. Apparently the Guest doesn't supply _OSC (seems like a spec violation; could tell from the dmesg), so A's native_pcie_hotplug remains cleared, which means vmd_copy_host_bridge_flags() will also clear it for B. [The lack of a Guest BIOS _OSC would also mean that if you passed through a normal non-VMD PCIe Switch Downstream Port to the Guest, the Guest OS would never be able to manage hotplug below that Port. Maybe nobody passes through Switch Ports.] > > "This mechanism works fine for Host OS and no issue has been > > observed." I guess this means that if the platform grants hotplug > > control to the Host OS via _OSC, pciehp in the Host OS works fine to > > manage hotplug on Root Ports below the VMD? > > Yes. When platform hotplug setting is enabled, then _OSC assigns > correct value to vmd root ports via vmd_copy_host_bridge_flags. > > > If the platform does *not* grant hotplug control to the Host OS, I > > guess it means the Host OS pciehp can *not* manage hotplug below VMD? > > Is that also what you expect? > > > > "However in case of VM, all the HyperVisors don't pass the Hotplug > > setting to the guest BIOS" What is the mechanism by which they would > > pass the hotplug setting? I guess the guest probably doesn't see the > > VMD endpoint itself, right? The guest sees either virtualized or > > passed-through VMD Root Ports? > > In guest, vmd is passthrough including pci topology behind vmd. The way > we verify Hotplug capability is to read Slot Capabilities registers' > hotplug enabled bit set on vmd root ports in Guest. I guess maybe this refers to set_pcie_hotplug_bridge(), which checks PCI_EXP_SLTCAP_HPC? If it's not set, pciehp won't bind to the device. This behavior is the same in Host and Guest. > The values copied in vmd_copy_host_bridge_flags are different in Host > than in Guest. I do not know what component is responsible for this in > every HyperVisor. But I tested this in ESXi, HyperV, KVM and they all > had the same behavior where the _OSC flags are set to power-on default > values of 0 by vmd_copy_host_bridge_flags in Guest OS which is > disabling hotplug. > > > I assume the guest OS sees a constructed ACPI system (different from > > the ACPI environment the host sees), so it sees a PNP0A03 host bridge > > with _OSC? And presumably VMD Root Ports below that host bridge? > > > > So are you saying the _OSC seen by the guest doesn't grant hotplug > > control to the guest OS? And it doesn't do that because the guest > > BIOS hasn't implemented _OSC? Or maybe the guest BIOS relies on the > > Slot Capabilities registers of VMD Root Ports to decide whether _OSC > > should grant hotplug control? And those Slot Capabilities don't > > advertise hotplug support? > > Currently Hotplug is controlled by _OSC in both host and Guest. In case > of guest, it seems guest BIOS hasn't implemented _OSC since all the > flags are set to 0 which is not the case in host. I think you want pciehp to work on the VMD Root Ports in the Guest, no matter what, on the assumption that any _OSC that applies to host bridge A does not apply to the host bridge created by the vmd driver. If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features"). Since host bridge B was not enumerated via ACPI, the OS owns all those features under B by default, so reverting 04b12ef163d1 would leave that state. Obviously we'd have to first figure out another solution for the AER message flood that 04b12ef163d1 resolved. > VMD is passthrough in Guest so the slot capabilities registers are > still same as what Host driver would see. That is how we can still > control hotplug on vmd on both Guest and Host. > > > > "... which results in assigning default values and disabling Hotplug > > capability in the guest." What default values are these? > > 0. Everything is disabled in guest. So basically _OSC is writing wrong > values in guest. > > > Is this talking about the default host_bridge->native_pcie_hotplug > > value set in acpi_pci_root_create(), where the OS assumes hotplug > > is owned by the platform unless _OSC grants control to the OS? > > vmd driver calls pci_create_root_bus which eventually ends up > calling pci_init_host_bridge where native_pcie_hotplug is set to 1. > Now vmd_copy_host_bridge_flags overwrites the native_pcie_hotplug to > _OSC setting of 0 in guest. > > > > VMD Hotplug can be enabled or disabled based on the VMD > > > rootports' Hotplug configuration in BIOS. is_hotplug_bridge is > > > set on each VMD rootport based on Hotplug capable bit in SltCap > > > in probe.c. Check is_hotplug_bridge and enable or disable > > > native_pcie_hotplug based on that value. > > > > > > This patch will make sure that Hotplug is enabled properly in > > > Host as well as in VM while honoring _OSC settings as well as > > > VMD hotplug setting. > > > > "Hotplug is enabled properly in Host as well as in VM" sounds like > > we're talking about both the host OS and the guest OS. > > Yes. VM means guest. > > > > In the host OS, this overrides whatever was negotiated via _OSC, I > > guess on the principle that _OSC doesn't apply because the host BIOS > > doesn't know about the Root Ports below the VMD. (I'm not sure it's > > 100% resolved that _OSC doesn't apply to them, so we should mention > > that assumption here.) > > _OSC still controls every flag including Hotplug. I have observed > that slot capabilities register has hotplug enabled only when > platform has enabled the hotplug. So technically not overriding it > in the host. > > It overrides in guest because _OSC is passing wrong/different > information than the _OSC information in Host. Also like I > mentioned, slot capabilities registers are not changed in Guest > because vmd is passthrough. > > > > In the guest OS, this still overrides whatever was negotiated via > > _OSC, but presumably the guest BIOS *does* know about the Root > > Ports, so I don't think the same principle about overriding _OSC > > applies there. > > > > I think it's still a problem that we handle > > host_bridge->native_pcie_hotplug differently than all the other > > features negotiated via _OSC. We should at least explain this > > somehow. > > Right now this is the only way to know in Guest OS if platform has > enabled Hotplug or not. We have many customers complaining about > Hotplug being broken in Guest. So just trying to honor _OSC but also > fixing its limitation in Guest. > > > > I suspect part of the reason for doing it differently is to avoid > > the interrupt storm that we avoid via 04b12ef163d1 ("PCI: vmd: > > Honor ACPI _OSC on PCIe features"). If so, I think 04b12ef163d1 > > should be mentioned here because it's an important piece of the > > picture. > > I can add a note about this patch as well. Let me know if you want > me to add something specific. > > Thank you for taking the time. This is a very critical issue for us > and we have many customers complaining about it, I would appreciate > to get it resolved as soon as possible. > > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com> > > > --- > > > v1->v2: Updating commit message. > > > --- > > > drivers/pci/controller/vmd.c | 11 ++++++++++- > > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/pci/controller/vmd.c > > > b/drivers/pci/controller/vmd.c > > > index 769eedeb8802..e39eaef5549a 100644 > > > --- a/drivers/pci/controller/vmd.c > > > +++ b/drivers/pci/controller/vmd.c > > > @@ -720,6 +720,7 @@ static int vmd_enable_domain(struct vmd_dev > > > *vmd, unsigned long features) > > > resource_size_t membar2_offset = 0x2000; > > > struct pci_bus *child; > > > struct pci_dev *dev; > > > + struct pci_host_bridge *vmd_bridge; > > > int ret; > > > > > > /* > > > @@ -886,8 +887,16 @@ static int vmd_enable_domain(struct vmd_dev > > > *vmd, unsigned long features) > > > * and will fail pcie_bus_configure_settings() early. It can > > > instead be > > > * run on each of the real root ports. > > > */ > > > - list_for_each_entry(child, &vmd->bus->children, node) > > > + vmd_bridge = to_pci_host_bridge(vmd->bus->bridge); > > > + list_for_each_entry(child, &vmd->bus->children, node) { > > > pcie_bus_configure_settings(child); > > > + /* > > > + * When Hotplug is enabled on vmd root-port, enable it > > > on vmd > > > + * bridge. > > > + */ > > > + if (child->self->is_hotplug_bridge) > > > + vmd_bridge->native_pcie_hotplug = 1; > > > > "is_hotplug_bridge" basically means PCI_EXP_SLTCAP_HPC is set, which > > means "the hardware of this slot is hot-plug *capable*." > > > > Whether hotplug is *enabled* is a separate policy decision about > > whether the OS is allowed to operate the hotplug functionality, so I > > think saying "When Hotplug is enabled" in the comment is a little bit > > misleading. > > I will rewrite the comment. > > > > Bjorn > > > > > + } > > > > > > pci_bus_add_devices(vmd->bus); > > > > > > -- > > > 2.31.1 > > > >
On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote: > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote: > > On Fri, 2023-12-01 at 18:07 -0600, Bjorn Helgaas wrote: > > > On Mon, Nov 27, 2023 at 04:17:29PM -0500, Nirmal Patel wrote: > > > > Currently VMD copies root bridge setting to enable Hotplug on > > > > its > > > > rootports. This mechanism works fine for Host OS and no issue > > > > has > > > > been observed. However in case of VM, all the HyperVisors don't > > > > pass the Hotplug setting to the guest BIOS which results in > > > > assigning default values and disabling Hotplug capability in > > > > the > > > > guest which have been observed by many OEMs. > > > > > > Can we make this a little more specific? I'm lacking a lot of > > > virtualization background here, so I'm going to ask a bunch of > > > stupid > > > questions here: > > > > > > "VMD copies root bridge setting" refers to _OSC and the copying > > > done > > > in vmd_copy_host_bridge_flags()? Obviously this must be in the > > > Host > > > OS. > > > > Yes. we are talking about vmd_copy_host_bridge_flags. It gets > > called in > > both Host and Guest. But it assigns different values. In Host, it > > assigns correct values, while in guest it assigns power-on default > > value 0 which in simple term disable everything including hotplug. > > If vmd_copy_host_bridge_flags() is called both in the Host and the > Guest, I guess that means both the VMD endpoint and the VMD Root > Ports > below it are passed through to the Guest? I expected only the VMD > Root Ports would be passed through, but maybe that's my > misunderstanding. > > The VMD endpoint is under host bridge A, and the VMD creates a new > PCI > domain under a new host bridge B. vmd_copy_host_bridge_flags() > copies > all the feature ownership flags from A to B. > > On ACPI systems (like both Host and Guest) all these flags are > initially cleared for host bridge A in acpi_pci_root_create() because > these features are owned by the platform until _OSC grants ownership > to the OS. They are initially *set* for host bridge B in > pci_init_host_bridge() because it's created by the vmd driver instead > of being enumerated via ACPI. > > If the Host _OSC grants ownership to the OS, A's native_pcie_hotplug > will be set, and vmd_copy_host_bridge_flags() leaves it set for B as > well. If the Host _OSC does *not* grant hotplug ownership to the OS, > native_pcie_hotplug will be cleared for both A and B. > > Apparently the Guest doesn't supply _OSC (seems like a spec > violation; > could tell from the dmesg), so A's native_pcie_hotplug remains > cleared, which means vmd_copy_host_bridge_flags() will also clear it > for B. > > [The lack of a Guest BIOS _OSC would also mean that if you passed > through a normal non-VMD PCIe Switch Downstream Port to the Guest, > the > Guest OS would never be able to manage hotplug below that > Port. Maybe > nobody passes through Switch Ports.] > > > > "This mechanism works fine for Host OS and no issue has been > > > observed." I guess this means that if the platform grants > > > hotplug > > > control to the Host OS via _OSC, pciehp in the Host OS works fine > > > to > > > manage hotplug on Root Ports below the VMD? > > > > Yes. When platform hotplug setting is enabled, then _OSC assigns > > correct value to vmd root ports via vmd_copy_host_bridge_flags. > > > > > If the platform does *not* grant hotplug control to the Host OS, > > > I > > > guess it means the Host OS pciehp can *not* manage hotplug below > > > VMD? > > > Is that also what you expect? > > > > > > "However in case of VM, all the HyperVisors don't pass the > > > Hotplug > > > setting to the guest BIOS" What is the mechanism by which they > > > would > > > pass the hotplug setting? I guess the guest probably doesn't see > > > the > > > VMD endpoint itself, right? The guest sees either virtualized or > > > passed-through VMD Root Ports? > > > > In guest, vmd is passthrough including pci topology behind vmd. The > > way > > we verify Hotplug capability is to read Slot Capabilities > > registers' > > hotplug enabled bit set on vmd root ports in Guest. > > I guess maybe this refers to set_pcie_hotplug_bridge(), which checks > PCI_EXP_SLTCAP_HPC? If it's not set, pciehp won't bind to the > device. > This behavior is the same in Host and Guest. > > > The values copied in vmd_copy_host_bridge_flags are different in > > Host > > than in Guest. I do not know what component is responsible for this > > in > > every HyperVisor. But I tested this in ESXi, HyperV, KVM and they > > all > > had the same behavior where the _OSC flags are set to power-on > > default > > values of 0 by vmd_copy_host_bridge_flags in Guest OS which is > > disabling hotplug. > > > > > I assume the guest OS sees a constructed ACPI system (different > > > from > > > the ACPI environment the host sees), so it sees a PNP0A03 host > > > bridge > > > with _OSC? And presumably VMD Root Ports below that host bridge? > > > > > > So are you saying the _OSC seen by the guest doesn't grant > > > hotplug > > > control to the guest OS? And it doesn't do that because the > > > guest > > > BIOS hasn't implemented _OSC? Or maybe the guest BIOS relies on > > > the > > > Slot Capabilities registers of VMD Root Ports to decide whether > > > _OSC > > > should grant hotplug control? And those Slot Capabilities don't > > > advertise hotplug support? > > > > Currently Hotplug is controlled by _OSC in both host and Guest. In > > case > > of guest, it seems guest BIOS hasn't implemented _OSC since all the > > flags are set to 0 which is not the case in host. > > I think you want pciehp to work on the VMD Root Ports in the Guest, > no matter what, on the assumption that any _OSC that applies to host > bridge A does not apply to the host bridge created by the vmd driver. > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC > on PCIe features"). Since host bridge B was not enumerated via ACPI, > the OS owns all those features under B by default, so reverting > 04b12ef163d1 would leave that state. > > Obviously we'd have to first figure out another solution for the AER > message flood that 04b12ef163d1 resolved. Hi Bjorn, If we revert 04b12ef163d1, then VMD driver will still enable AER blindly which is a bug. So we need to find a way to make VMD driver aware of AER platform setting and use that information to enable or disable AER for its child devices. There is a setting in BIOS that allows us to enable OS native AER support on the platform. This setting is located in EDK Menu -> Platform configuration -> system event log -> IIO error enabling -> OS native AER support. This setting is assigned to VMD bridge by vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without the patch 04b12ef163d1, VMD driver will enable AER even if platform has disabled OS native AER support as mentioned earlier. This will result in an AER flood mentioned in 04b12e f163d1 since there is no AER handlers. I believe the rate limit will not alone fix the issue rather will act as a work around. If VMD is aware of OS native AER support setting, then we will not see AER flooding issue. Do you have any suggestion on how to make VMD driver aware of AER setting and keep it in sync with platform setting. Thanks nirmal > > > VMD is passthrough in Guest so the slot capabilities registers are > > still same as what Host driver would see. That is how we can still > > control hotplug on vmd on both Guest and Host. > > > "... which results in assigning default values and disabling > > > Hotplug > > > capability in the guest." What default values are these? > > > > 0. Everything is disabled in guest. So basically _OSC is writing > > wrong > > values in guest. > > > > > Is this talking about the default host_bridge- > > > >native_pcie_hotplug > > > value set in acpi_pci_root_create(), where the OS assumes hotplug > > > is owned by the platform unless _OSC grants control to the OS? > > > > vmd driver calls pci_create_root_bus which eventually ends up > > calling pci_init_host_bridge where native_pcie_hotplug is set to 1. > > Now vmd_copy_host_bridge_flags overwrites the native_pcie_hotplug > > to > > _OSC setting of 0 in guest. > > > > > > VMD Hotplug can be enabled or disabled based on the VMD > > > > rootports' Hotplug configuration in BIOS. is_hotplug_bridge is > > > > set on each VMD rootport based on Hotplug capable bit in SltCap > > > > in probe.c. Check is_hotplug_bridge and enable or disable > > > > native_pcie_hotplug based on that value. > > > > > > > > This patch will make sure that Hotplug is enabled properly in > > > > Host as well as in VM while honoring _OSC settings as well as > > > > VMD hotplug setting. > > > > > > "Hotplug is enabled properly in Host as well as in VM" sounds > > > like > > > we're talking about both the host OS and the guest OS. > > > > Yes. VM means guest. > > > In the host OS, this overrides whatever was negotiated via _OSC, > > > I > > > guess on the principle that _OSC doesn't apply because the host > > > BIOS > > > doesn't know about the Root Ports below the VMD. (I'm not sure > > > it's > > > 100% resolved that _OSC doesn't apply to them, so we should > > > mention > > > that assumption here.) > > > > _OSC still controls every flag including Hotplug. I have observed > > that slot capabilities register has hotplug enabled only when > > platform has enabled the hotplug. So technically not overriding it > > in the host. > > > > It overrides in guest because _OSC is passing wrong/different > > information than the _OSC information in Host. Also like I > > mentioned, slot capabilities registers are not changed in Guest > > because vmd is passthrough. > > > In the guest OS, this still overrides whatever was negotiated via > > > _OSC, but presumably the guest BIOS *does* know about the Root > > > Ports, so I don't think the same principle about overriding _OSC > > > applies there. > > > > > > I think it's still a problem that we handle > > > host_bridge->native_pcie_hotplug differently than all the other > > > features negotiated via _OSC. We should at least explain this > > > somehow. > > > > Right now this is the only way to know in Guest OS if platform has > > enabled Hotplug or not. We have many customers complaining about > > Hotplug being broken in Guest. So just trying to honor _OSC but > > also > > fixing its limitation in Guest. > > > I suspect part of the reason for doing it differently is to avoid > > > the interrupt storm that we avoid via 04b12ef163d1 ("PCI: vmd: > > > Honor ACPI _OSC on PCIe features"). If so, I think 04b12ef163d1 > > > should be mentioned here because it's an important piece of the > > > picture. > > > > I can add a note about this patch as well. Let me know if you want > > me to add something specific. > > > > Thank you for taking the time. This is a very critical issue for us > > and we have many customers complaining about it, I would appreciate > > to get it resolved as soon as possible. > > > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com> > > > > --- > > > > v1->v2: Updating commit message. > > > > --- > > > > drivers/pci/controller/vmd.c | 11 ++++++++++- > > > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/pci/controller/vmd.c > > > > b/drivers/pci/controller/vmd.c > > > > index 769eedeb8802..e39eaef5549a 100644 > > > > --- a/drivers/pci/controller/vmd.c > > > > +++ b/drivers/pci/controller/vmd.c > > > > @@ -720,6 +720,7 @@ static int vmd_enable_domain(struct vmd_dev > > > > *vmd, unsigned long features) > > > > resource_size_t membar2_offset = 0x2000; > > > > struct pci_bus *child; > > > > struct pci_dev *dev; > > > > + struct pci_host_bridge *vmd_bridge; > > > > int ret; > > > > > > > > /* > > > > @@ -886,8 +887,16 @@ static int vmd_enable_domain(struct > > > > vmd_dev > > > > *vmd, unsigned long features) > > > > * and will fail pcie_bus_configure_settings() early. > > > > It can > > > > instead be > > > > * run on each of the real root ports. > > > > */ > > > > - list_for_each_entry(child, &vmd->bus->children, node) > > > > + vmd_bridge = to_pci_host_bridge(vmd->bus->bridge); > > > > + list_for_each_entry(child, &vmd->bus->children, node) { > > > > pcie_bus_configure_settings(child); > > > > + /* > > > > + * When Hotplug is enabled on vmd root-port, > > > > enable it > > > > on vmd > > > > + * bridge. > > > > + */ > > > > + if (child->self->is_hotplug_bridge) > > > > + vmd_bridge->native_pcie_hotplug = 1; > > > > > > "is_hotplug_bridge" basically means PCI_EXP_SLTCAP_HPC is set, > > > which > > > means "the hardware of this slot is hot-plug *capable*." > > > > > > Whether hotplug is *enabled* is a separate policy decision about > > > whether the OS is allowed to operate the hotplug functionality, > > > so I > > > think saying "When Hotplug is enabled" in the comment is a little > > > bit > > > misleading. > > > > I will rewrite the comment. > > > Bjorn > > > > > > > + } > > > > > > > > pci_bus_add_devices(vmd->bus); > > > > > > > > -- > > > > 2.31.1 > > > >
On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote: > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote: > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote: > > ... > > > Currently Hotplug is controlled by _OSC in both host and Guest. > > > In case of guest, it seems guest BIOS hasn't implemented _OSC > > > since all the flags are set to 0 which is not the case in host. > > > > I think you want pciehp to work on the VMD Root Ports in the > > Guest, no matter what, on the assumption that any _OSC that > > applies to host bridge A does not apply to the host bridge created > > by the vmd driver. > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor ACPI > > _OSC on PCIe features"). Since host bridge B was not enumerated > > via ACPI, the OS owns all those features under B by default, so > > reverting 04b12ef163d1 would leave that state. > > > > Obviously we'd have to first figure out another solution for the > > AER message flood that 04b12ef163d1 resolved. > > If we revert 04b12ef163d1, then VMD driver will still enable AER > blindly which is a bug. So we need to find a way to make VMD driver > aware of AER platform setting and use that information to enable or > disable AER for its child devices. > > There is a setting in BIOS that allows us to enable OS native AER > support on the platform. This setting is located in EDK Menu -> > Platform configuration -> system event log -> IIO error enabling -> > OS native AER support. > > This setting is assigned to VMD bridge by vmd_copy_host_bridge_flags > in patch 04b12ef163d1. Without the patch 04b12ef163d1, VMD driver > will enable AER even if platform has disabled OS native AER support > as mentioned earlier. This will result in an AER flood mentioned in > 04b12e f163d1 since there is no AER handlers. > > I believe the rate limit will not alone fix the issue rather will > act as a work around. I agree with this part. A rate limit would not solve the problem of an interrupt storm consuming the CPU. That could be addressed by switching to polling when the rate is high or possibly other ways. > If VMD is aware of OS native AER support setting, then we will not > see AER flooding issue. > > Do you have any suggestion on how to make VMD driver aware of AER > setting and keep it in sync with platform setting. Well, this is the whole problem. IIUC, you're saying we should use _OSC to negotiate for AER control below the VMD bridge, but ignore _OSC for hotplug control. I don't see how to read the _OSC spec and decide which things apply below a VMD bridge and which things don't. But I think a proposal that "_OSC doesn't apply to any devices below a VMD bridge," that *is* a pretty generic principle. If we think that's the right way to use _OSC, shouldn't the vmd driver be able to configure AER for devices below the VMD bridge in any way it wants? I'm not sure what "_OSC doesn't apply below a VMD" would mean for things like firmware-first error logging below the VMD. Maybe firmware-first doesn't make sense below VMD anyway because firmware and the OS have different ideas about the Segment for those devices? Bjorn
On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote: > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote: > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote: > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote: > > > ... > > > > Currently Hotplug is controlled by _OSC in both host and Guest. > > > > In case of guest, it seems guest BIOS hasn't implemented _OSC > > > > since all the flags are set to 0 which is not the case in host. > > > > > > I think you want pciehp to work on the VMD Root Ports in the > > > Guest, no matter what, on the assumption that any _OSC that > > > applies to host bridge A does not apply to the host bridge > > > created > > > by the vmd driver. > > > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor ACPI > > > _OSC on PCIe features"). Since host bridge B was not enumerated > > > via ACPI, the OS owns all those features under B by default, so > > > reverting 04b12ef163d1 would leave that state. > > > > > > Obviously we'd have to first figure out another solution for the > > > AER message flood that 04b12ef163d1 resolved. > > > > If we revert 04b12ef163d1, then VMD driver will still enable AER > > blindly which is a bug. So we need to find a way to make VMD driver > > aware of AER platform setting and use that information to enable or > > disable AER for its child devices. > > > > There is a setting in BIOS that allows us to enable OS native AER > > support on the platform. This setting is located in EDK Menu -> > > Platform configuration -> system event log -> IIO error enabling -> > > OS native AER support. > > > > This setting is assigned to VMD bridge by > > vmd_copy_host_bridge_flags > > in patch 04b12ef163d1. Without the patch 04b12ef163d1, VMD driver > > will enable AER even if platform has disabled OS native AER support > > as mentioned earlier. This will result in an AER flood mentioned in > > 04b12e f163d1 since there is no AER handlers. > > > > I believe the rate limit will not alone fix the issue rather will > > act as a work around. > > I agree with this part. A rate limit would not solve the problem of > an interrupt storm consuming the CPU. That could be addressed by > switching to polling when the rate is high or possibly other ways. > > > If VMD is aware of OS native AER support setting, then we will not > > see AER flooding issue. > > > > Do you have any suggestion on how to make VMD driver aware of AER > > setting and keep it in sync with platform setting. > > Well, this is the whole problem. IIUC, you're saying we should use > _OSC to negotiate for AER control below the VMD bridge, but ignore > _OSC for hotplug control. Because VMD has it's own hotplug BIOS setting which allows vmd to enable or disable hotplug on it's domain. However we dont have VMD specific AER, DPC, LTR settings. Is it okay if we include an additional check for hotplug? i.e. Hotplug capable bit in SltCap register which reflects VMD BIOS hotplug setting. Another way is to overwrite _OSC setting for hotplug only in Guest OS. If we make VMD driver aware of Host or Guest environment, only in case of Guest we overwrite _OSC hotplug using SltCap register and we don't revert the 04b12ef163d1. > > I don't see how to read the _OSC spec and decide which things apply > below a VMD bridge and which things don't. > > But I think a proposal that "_OSC doesn't apply to any devices below > a > VMD bridge," that *is* a pretty generic principle. If we think > that's > the right way to use _OSC, shouldn't the vmd driver be able to > configure AER for devices below the VMD bridge in any way it wants? > > I'm not sure what "_OSC doesn't apply below a VMD" would mean for > things like firmware-first error logging below the VMD. Maybe > firmware-first doesn't make sense below VMD anyway because firmware > and the OS have different ideas about the Segment for those devices? > > Bjorn
On Wed, Dec 13, 2023 at 06:07:02PM -0700, Nirmal Patel wrote: > On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote: > > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote: > > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote: > > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote: > > > > ... > > > > > Currently Hotplug is controlled by _OSC in both host and Guest. > > > > > In case of guest, it seems guest BIOS hasn't implemented _OSC > > > > > since all the flags are set to 0 which is not the case in host. > > > > > > > > I think you want pciehp to work on the VMD Root Ports in the > > > > Guest, no matter what, on the assumption that any _OSC that > > > > applies to host bridge A does not apply to the host bridge > > > > created > > > > by the vmd driver. > > > > > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor ACPI > > > > _OSC on PCIe features"). Since host bridge B was not enumerated > > > > via ACPI, the OS owns all those features under B by default, so > > > > reverting 04b12ef163d1 would leave that state. > > > > > > > > Obviously we'd have to first figure out another solution for the > > > > AER message flood that 04b12ef163d1 resolved. > > > > > > If we revert 04b12ef163d1, then VMD driver will still enable AER > > > blindly which is a bug. So we need to find a way to make VMD driver > > > aware of AER platform setting and use that information to enable or > > > disable AER for its child devices. > > > > > > There is a setting in BIOS that allows us to enable OS native AER > > > support on the platform. This setting is located in EDK Menu -> > > > Platform configuration -> system event log -> IIO error enabling -> > > > OS native AER support. > > > > > > This setting is assigned to VMD bridge by > > > vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without the > > > patch 04b12ef163d1, VMD driver will enable AER even if platform > > > has disabled OS native AER support as mentioned earlier. This > > > will result in an AER flood mentioned in 04b12ef163d1 since > > > there is no AER handlers. I missed this before. What does "there are no AER handlers" mean? Do you mean there are no *firmware* AER handlers? The dmesg log at https://bugzilla.kernel.org/attachment.cgi?id=299571 (from https://bugzilla.kernel.org/show_bug.cgi?id=215027, the bug fixed by 04b12ef163d1), shows this: acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3] acpi PNP0A08:00: _OSC: platform does not support [AER] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR] so the firmware did not grant AER control to the OS (I think "platform does not support" is a confusing description). Prior to 04b12ef163d1, Linux applied _OSC above the VMD bridge but not below it, so Linux had native control below VMD and it did handle AER interrupts from the 10000:e0:06.0 Root Port below VMD: vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0 pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0 After 04b12ef163d1, Linux applied _OSC below the VMD bridge as well, so it did not enable or handle any AER interrupts. I suspect the platform didn't handle AER interrupts below VMD either, so those errors were probably just ignored. > > > If VMD is aware of OS native AER support setting, then we will not > > > see AER flooding issue. > > > > > > Do you have any suggestion on how to make VMD driver aware of AER > > > setting and keep it in sync with platform setting. > > > > Well, this is the whole problem. IIUC, you're saying we should use > > _OSC to negotiate for AER control below the VMD bridge, but ignore > > _OSC for hotplug control. > > Because VMD has its own hotplug BIOS setting which allows vmd to > enable or disable hotplug on its domain. However we don't have VMD > specific AER, DPC, LTR settings. I don't quite follow. The OS knows nothing about whether BIOS settings exist, so they can't be used to determine where _OSC applies. > Is it okay if we include an additional check for hotplug? i.e. > Hotplug capable bit in SltCap register which reflects VMD BIOS > hotplug setting. I don't know what check you have in mind, but the OS can definitely use SltCap to decide what features to enable. You suggest above that you want vmd to be aware of AER ownership per _OSC, but it sounds more like you just want AER disabled below VMD regardless. Or do you have platforms that can actually handle AER interrupts from Root Ports below VMD? > Another way is to overwrite _OSC setting for hotplug only in Guest > OS. If we make VMD driver aware of Host or Guest environment, only > in case of Guest we overwrite _OSC hotplug using SltCap register and > we don't revert the 04b12ef163d1. Making vmd aware of being in host or guest sounds kind of ugly to me. Bjorn
On Thu, 2023-12-14 at 13:23 -0600, Bjorn Helgaas wrote: > On Wed, Dec 13, 2023 at 06:07:02PM -0700, Nirmal Patel wrote: > > On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote: > > > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote: > > > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote: > > > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote: > > > > > ... > > > > > > Currently Hotplug is controlled by _OSC in both host and > > > > > > Guest. > > > > > > In case of guest, it seems guest BIOS hasn't implemented > > > > > > _OSC > > > > > > since all the flags are set to 0 which is not the case in > > > > > > host. > > > > > > > > > > I think you want pciehp to work on the VMD Root Ports in the > > > > > Guest, no matter what, on the assumption that any _OSC that > > > > > applies to host bridge A does not apply to the host bridge > > > > > created > > > > > by the vmd driver. > > > > > > > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor > > > > > ACPI > > > > > _OSC on PCIe features"). Since host bridge B was not > > > > > enumerated > > > > > via ACPI, the OS owns all those features under B by default, > > > > > so > > > > > reverting 04b12ef163d1 would leave that state. > > > > > > > > > > Obviously we'd have to first figure out another solution for > > > > > the > > > > > AER message flood that 04b12ef163d1 resolved. > > > > > > > > If we revert 04b12ef163d1, then VMD driver will still enable > > > > AER > > > > blindly which is a bug. So we need to find a way to make VMD > > > > driver > > > > aware of AER platform setting and use that information to > > > > enable or > > > > disable AER for its child devices. > > > > > > > > There is a setting in BIOS that allows us to enable OS native > > > > AER > > > > support on the platform. This setting is located in EDK Menu -> > > > > Platform configuration -> system event log -> IIO error > > > > enabling -> > > > > OS native AER support. > > > > > > > > This setting is assigned to VMD bridge by > > > > vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without the > > > > patch 04b12ef163d1, VMD driver will enable AER even if platform > > > > has disabled OS native AER support as mentioned earlier. This > > > > will result in an AER flood mentioned in 04b12ef163d1 since > > > > there is no AER handlers. > > I missed this before. What does "there are no AER handlers" > mean? Do > you mean there are no *firmware* AER handlers? Sorry for confusing wordings. Your understanding is correct. The patch 04b12ef163d1 is used to disable AER on vmd and avoid the AER flooding by applying _OSC settings since vmd driver doesn't give a choice to toggle AER, DPC, LTR, etc. > > The dmesg log at https://bugzilla.kernel.org/attachment.cgi?id=299571 > (from https://bugzilla.kernel.org/show_bug.cgi?id=215027, the bug > fixed by 04b12ef163d1), shows this: > > acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM > Segments MSI HPX-Type3] > acpi PNP0A08:00: _OSC: platform does not support [AER] > acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME > PCIeCapability LTR] > > so the firmware did not grant AER control to the OS (I think > "platform > does not support" is a confusing description). > > Prior to 04b12ef163d1, Linux applied _OSC above the VMD bridge but > not > below it, so Linux had native control below VMD and it did handle AER > interrupts from the 10000:e0:06.0 Root Port below VMD: > > vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0 > pcieport 10000:e0:06.0: AER: Corrected error received: > 10000:e1:00.0 > > After 04b12ef163d1, Linux applied _OSC below the VMD bridge as well, > so it did not enable or handle any AER interrupts. I suspect the > platform didn't handle AER interrupts below VMD either, so those > errors were probably just ignored. > > > > > If VMD is aware of OS native AER support setting, then we will > > > > not > > > > see AER flooding issue. > > > > > > > > Do you have any suggestion on how to make VMD driver aware of > > > > AER > > > > setting and keep it in sync with platform setting. > > > > > > Well, this is the whole problem. IIUC, you're saying we should > > > use > > > _OSC to negotiate for AER control below the VMD bridge, but > > > ignore > > > _OSC for hotplug control. > > > > Because VMD has its own hotplug BIOS setting which allows vmd to > > enable or disable hotplug on its domain. However we don't have VMD > > specific AER, DPC, LTR settings. > > I don't quite follow. The OS knows nothing about whether BIOS > settings exist, so they can't be used to determine where _OSC > applies. > > > Is it okay if we include an additional check for hotplug? i.e. > > Hotplug capable bit in SltCap register which reflects VMD BIOS > > hotplug setting. > > I don't know what check you have in mind, but the OS can definitely > use SltCap to decide what features to enable. Yes I agree. That is what I am also suggesting in this patch. > > You suggest above that you want vmd to be aware of AER ownership per > _OSC, but it sounds more like you just want AER disabled below VMD > regardless. Or do you have platforms that can actually handle AER > interrupts from Root Ports below VMD? No I dont want AER disabled below VMD all the time. I am suggesting vmd should be in sync with all the _OSC flags since vmd doesn't give a choice to toggle. However, the issue arises in guest where _OSC setting for hotplug is always 0 eventhough hotplug is 1 in host and hotplug enable bit in SltCap register is 1 in both host and guest. So we can use that to enable hotplug in guest. Like you suggested in your above comment. > > > Another way is to overwrite _OSC setting for hotplug only in Guest > > OS. If we make VMD driver aware of Host or Guest environment, only > > in case of Guest we overwrite _OSC hotplug using SltCap register > > and > > we don't revert the 04b12ef163d1. > > Making vmd aware of being in host or guest sounds kind of ugly to me. I agree. > > Bjorn
On Thu, 2023-12-14 at 15:22 -0700, Nirmal Patel wrote: > On Thu, 2023-12-14 at 13:23 -0600, Bjorn Helgaas wrote: > > On Wed, Dec 13, 2023 at 06:07:02PM -0700, Nirmal Patel wrote: > > > On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote: > > > > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote: > > > > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote: > > > > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel > > > > > > wrote: > > > > > > ... > > > > > > > Currently Hotplug is controlled by _OSC in both host and > > > > > > > Guest. > > > > > > > In case of guest, it seems guest BIOS hasn't implemented > > > > > > > _OSC > > > > > > > since all the flags are set to 0 which is not the case in > > > > > > > host. > > > > > > > > > > > > I think you want pciehp to work on the VMD Root Ports in > > > > > > the > > > > > > Guest, no matter what, on the assumption that any _OSC that > > > > > > applies to host bridge A does not apply to the host bridge > > > > > > created > > > > > > by the vmd driver. > > > > > > > > > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor > > > > > > ACPI > > > > > > _OSC on PCIe features"). Since host bridge B was not > > > > > > enumerated > > > > > > via ACPI, the OS owns all those features under B by > > > > > > default, > > > > > > so > > > > > > reverting 04b12ef163d1 would leave that state. > > > > > > > > > > > > Obviously we'd have to first figure out another solution > > > > > > for > > > > > > the > > > > > > AER message flood that 04b12ef163d1 resolved. > > > > > > > > > > If we revert 04b12ef163d1, then VMD driver will still enable > > > > > AER > > > > > blindly which is a bug. So we need to find a way to make VMD > > > > > driver > > > > > aware of AER platform setting and use that information to > > > > > enable or > > > > > disable AER for its child devices. > > > > > > > > > > There is a setting in BIOS that allows us to enable OS native > > > > > AER > > > > > support on the platform. This setting is located in EDK Menu > > > > > -> > > > > > Platform configuration -> system event log -> IIO error > > > > > enabling -> > > > > > OS native AER support. > > > > > > > > > > This setting is assigned to VMD bridge by > > > > > vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without the > > > > > patch 04b12ef163d1, VMD driver will enable AER even if > > > > > platform > > > > > has disabled OS native AER support as mentioned earlier. This > > > > > will result in an AER flood mentioned in 04b12ef163d1 since > > > > > there is no AER handlers. > > > > I missed this before. What does "there are no AER handlers" > > mean? Do > > you mean there are no *firmware* AER handlers? > Sorry for confusing wordings. Your understanding is correct. The > patch > 04b12ef163d1 is used to disable AER on vmd and avoid the AER flooding > by applying _OSC settings since vmd driver doesn't give a choice to > toggle AER, DPC, LTR, etc. > > The dmesg log at > > https://bugzilla.kernel.org/attachment.cgi?id=299571 > > (from https://bugzilla.kernel.org/show_bug.cgi?id=215027, the bug > > fixed by 04b12ef163d1), shows this: > > > > acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM > > Segments MSI HPX-Type3] > > acpi PNP0A08:00: _OSC: platform does not support [AER] > > acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug > > PME > > PCIeCapability LTR] > > > > so the firmware did not grant AER control to the OS (I think > > "platform > > does not support" is a confusing description). > > > > Prior to 04b12ef163d1, Linux applied _OSC above the VMD bridge but > > not > > below it, so Linux had native control below VMD and it did handle > > AER > > interrupts from the 10000:e0:06.0 Root Port below VMD: > > > > vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0 > > pcieport 10000:e0:06.0: AER: Corrected error received: > > 10000:e1:00.0 > > > > After 04b12ef163d1, Linux applied _OSC below the VMD bridge as > > well, > > so it did not enable or handle any AER interrupts. I suspect the > > platform didn't handle AER interrupts below VMD either, so those > > errors were probably just ignored. > > > > > > > If VMD is aware of OS native AER support setting, then we > > > > > will > > > > > not > > > > > see AER flooding issue. > > > > > > > > > > Do you have any suggestion on how to make VMD driver aware of > > > > > AER > > > > > setting and keep it in sync with platform setting. > > > > > > > > Well, this is the whole problem. IIUC, you're saying we should > > > > use > > > > _OSC to negotiate for AER control below the VMD bridge, but > > > > ignore > > > > _OSC for hotplug control. > > > > > > Because VMD has its own hotplug BIOS setting which allows vmd to > > > enable or disable hotplug on its domain. However we don't have > > > VMD > > > specific AER, DPC, LTR settings. > > > > I don't quite follow. The OS knows nothing about whether BIOS > > settings exist, so they can't be used to determine where _OSC > > applies. > > > > > Is it okay if we include an additional check for hotplug? i.e. > > > Hotplug capable bit in SltCap register which reflects VMD BIOS > > > hotplug setting. > > > > I don't know what check you have in mind, but the OS can definitely > > use SltCap to decide what features to enable. > Yes I agree. That is what I am also suggesting in this patch. > > You suggest above that you want vmd to be aware of AER ownership > > per > > _OSC, but it sounds more like you just want AER disabled below VMD > > regardless. Or do you have platforms that can actually handle AER > > interrupts from Root Ports below VMD? > No I dont want AER disabled below VMD all the time. I am suggesting > vmd > should be in sync with all the _OSC flags since vmd doesn't give a > choice to toggle. However, the issue arises in guest where _OSC > setting > for hotplug is always 0 eventhough hotplug is 1 in host and hotplug > enable bit in SltCap register is 1 in both host and guest. So we can > use that to enable hotplug in guest. Like you suggested in your above > comment. > > > Another way is to overwrite _OSC setting for hotplug only in > > > Guest > > > OS. If we make VMD driver aware of Host or Guest environment, > > > only > > > in case of Guest we overwrite _OSC hotplug using SltCap register > > > and > > > we don't revert the 04b12ef163d1. > > > > Making vmd aware of being in host or guest sounds kind of ugly to > > me. > I agree. > > Bjorn Hi Bjorn, Happy new year. sending a gentle reminder to help conclude the discussion. Thanks nirmal
[+cc Rafael, Kai-Heng] On Thu, Dec 14, 2023 at 03:22:21PM -0700, Nirmal Patel wrote: > On Thu, 2023-12-14 at 13:23 -0600, Bjorn Helgaas wrote: > > On Wed, Dec 13, 2023 at 06:07:02PM -0700, Nirmal Patel wrote: > > > On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote: > > > > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote: > > > > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote: > > > > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote: > > > > > > ... > > > > > > > Currently Hotplug is controlled by _OSC in both host and > > > > > > > Guest. In case of guest, it seems guest BIOS hasn't > > > > > > > implemented _OSC since all the flags are set to 0 which > > > > > > > is not the case in host. > > > > > > > > > > > > I think you want pciehp to work on the VMD Root Ports in > > > > > > the Guest, no matter what, on the assumption that any _OSC > > > > > > that applies to host bridge A does not apply to the host > > > > > > bridge created by the vmd driver. > > > > > > > > > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: > > > > > > Honor ACPI _OSC on PCIe features"). Since host bridge B > > > > > > was not enumerated via ACPI, the OS owns all those > > > > > > features under B by default, so reverting 04b12ef163d1 > > > > > > would leave that state. > > > > > > > > > > > > Obviously we'd have to first figure out another solution > > > > > > for the AER message flood that 04b12ef163d1 resolved. > > > > > > > > > > If we revert 04b12ef163d1, then VMD driver will still enable > > > > > AER blindly which is a bug. So we need to find a way to make > > > > > VMD driver aware of AER platform setting and use that > > > > > information to enable or disable AER for its child devices. > > > > > > > > > > There is a setting in BIOS that allows us to enable OS > > > > > native AER support on the platform. This setting is located > > > > > in EDK Menu -> Platform configuration -> system event log -> > > > > > IIO error enabling -> OS native AER support. > > > > > > > > > > This setting is assigned to VMD bridge by > > > > > vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without > > > > > the patch 04b12ef163d1, VMD driver will enable AER even if > > > > > platform has disabled OS native AER support as mentioned > > > > > earlier. This will result in an AER flood mentioned in > > > > > 04b12ef163d1 since there is no AER handlers. > > > > I missed this before. What does "there are no AER handlers" mean? > > Do you mean there are no *firmware* AER handlers? > > Sorry for confusing wordings. Your understanding is correct. The patch > 04b12ef163d1 is used to disable AER on vmd and avoid the AER flooding > by applying _OSC settings since vmd driver doesn't give a choice to > toggle AER, DPC, LTR, etc. > > > > The dmesg log at https://bugzilla.kernel.org/attachment.cgi?id=299571 > > (from https://bugzilla.kernel.org/show_bug.cgi?id=215027, the bug > > fixed by 04b12ef163d1), shows this: > > > > acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM > > Segments MSI HPX-Type3] > > acpi PNP0A08:00: _OSC: platform does not support [AER] > > acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME > > PCIeCapability LTR] > > > > so the firmware did not grant AER control to the OS (I think > > "platform does not support" is a confusing description). > > > > Prior to 04b12ef163d1, Linux applied _OSC above the VMD bridge but > > not below it, so Linux had native control below VMD and it did > > handle AER interrupts from the 10000:e0:06.0 Root Port below VMD: > > > > vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0 > > pcieport 10000:e0:06.0: AER: Corrected error received: > > 10000:e1:00.0 > > > > After 04b12ef163d1, Linux applied _OSC below the VMD bridge as well, > > so it did not enable or handle any AER interrupts. I suspect the > > platform didn't handle AER interrupts below VMD either, so those > > errors were probably just ignored. > > > > > > > If VMD is aware of OS native AER support setting, then we > > > > > will not see AER flooding issue. > > > > > > > > > > Do you have any suggestion on how to make VMD driver aware > > > > > of AER setting and keep it in sync with platform setting. > > > > > > > > Well, this is the whole problem. IIUC, you're saying we > > > > should use _OSC to negotiate for AER control below the VMD > > > > bridge, but ignore _OSC for hotplug control. > > > > > > Because VMD has its own hotplug BIOS setting which allows vmd to > > > enable or disable hotplug on its domain. However we don't have > > > VMD specific AER, DPC, LTR settings. > > > > I don't quite follow. The OS knows nothing about whether BIOS > > settings exist, so they can't be used to determine where _OSC > > applies. > > > > > Is it okay if we include an additional check for hotplug? i.e. > > > Hotplug capable bit in SltCap register which reflects VMD BIOS > > > hotplug setting. > > > > I don't know what check you have in mind, but the OS can > > definitely use SltCap to decide what features to enable. > > Yes I agree. That is what I am also suggesting in this patch. I should have said "the OS can use SltCap to *help* decide what features to enable." For ACPI host bridges, _OSC determines whether hotplug should be operated by the platform or the OS. I think we're converging on the idea that since VMD is effectively *not* an ACPI host bridge and doesn't have its own _OSC, the _OSC that applies to the VMD endpoint does *not* apply to the hierarchy below the VMD. In that case, the default is that the OS owns all the features (hotplug, AER, etc) below the VMD. > > You suggest above that you want vmd to be aware of AER ownership per > > _OSC, but it sounds more like you just want AER disabled below VMD > > regardless. Or do you have platforms that can actually handle AER > > interrupts from Root Ports below VMD? > > No I dont want AER disabled below VMD all the time. I am suggesting > vmd should be in sync with all the _OSC flags since vmd doesn't give > a choice to toggle. This is what I don't understand. If we decide that _OSC doesn't apply below VMD because the VMD host bridge isn't described in ACPI, the idea of being in sync with _OSC doesn't make sense. > However, the issue arises in guest where _OSC setting for hotplug is > always 0 even though hotplug is 1 in host and hotplug enable bit in > SltCap register is 1 in both host and guest. So we can use that to > enable hotplug in guest. Like you suggested in your above comment. All this got paged out over the holidays, so I need to refresh my understanding here. Maybe it will help if we can make the situation more concrete. I'm basing this on the logs at https://bugzilla.kernel.org/show_bug.cgi?id=215027. I assume the 10000:e0:06.0 Root Port and the 10000:e1:00.0 NVMe device are both passed through to the guest. I'm sure I got lots wrong here, so please correct me :) Host OS sees: PNP0A08 host bridge to 0000 [bus 00-ff] _OSC applies to domain 0000 OS owns [PCIeHotplug SHPCHotplug PME PCIeCapability LTR] in domain 0000 vmd 0000:00:0e.0: PCI host bridge to domain 10000 [bus e0-ff] no _OSC applies in domain 10000; host OS owns all PCIe features in domain 10000 pci 10000:e0:06.0: [8086:464d] # VMD root port pci 10000:e0:06.0: PCI bridge to [bus e1] pci 10000:e0:06.0: SltCap: HotPlug+ # Hotplug Capable pci 10000:e1:00.0: [144d:a80a] # nvme Guest OS sees: PNP0A03 host bridge to 0000 [bus 00-ff] _OSC applies to domain 0000 platform owns [PCIeHotplug ...] # _OSC doesn't grant to OS pci 0000:e0:06.0: [8086:464d] # VMD root port pci 0000:e0:06.0: PCI bridge to [bus e1] pci 0000:e0:06.0: SltCap: HotPlug+ # Hotplug Capable pci 0000:e1:00.0: [144d:a80a] # nvme So the guest OS sees that the VMD Root Port supports hotplug, but it can't use it because it was not granted ownership of the feature? Bjorn
On Fri, 2024-01-12 at 16:55 -0600, Bjorn Helgaas wrote: > [+cc Rafael, Kai-Heng] > > On Thu, Dec 14, 2023 at 03:22:21PM -0700, Nirmal Patel wrote: > > On Thu, 2023-12-14 at 13:23 -0600, Bjorn Helgaas wrote: > > > On Wed, Dec 13, 2023 at 06:07:02PM -0700, Nirmal Patel wrote: > > > > On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote: > > > > > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote: > > > > > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote: > > > > > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel > > > > > > > wrote: > > > > > > > ... > > > > > > > > Currently Hotplug is controlled by _OSC in both host > > > > > > > > and > > > > > > > > Guest. In case of guest, it seems guest BIOS hasn't > > > > > > > > implemented _OSC since all the flags are set to 0 which > > > > > > > > is not the case in host. > > > > > > > > > > > > > > I think you want pciehp to work on the VMD Root Ports in > > > > > > > the Guest, no matter what, on the assumption that any > > > > > > > _OSC > > > > > > > that applies to host bridge A does not apply to the host > > > > > > > bridge created by the vmd driver. > > > > > > > > > > > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: > > > > > > > Honor ACPI _OSC on PCIe features"). Since host bridge B > > > > > > > was not enumerated via ACPI, the OS owns all those > > > > > > > features under B by default, so reverting 04b12ef163d1 > > > > > > > would leave that state. > > > > > > > > > > > > > > Obviously we'd have to first figure out another solution > > > > > > > for the AER message flood that 04b12ef163d1 resolved. > > > > > > > > > > > > If we revert 04b12ef163d1, then VMD driver will still > > > > > > enable > > > > > > AER blindly which is a bug. So we need to find a way to > > > > > > make > > > > > > VMD driver aware of AER platform setting and use that > > > > > > information to enable or disable AER for its child devices. > > > > > > > > > > > > There is a setting in BIOS that allows us to enable OS > > > > > > native AER support on the platform. This setting is located > > > > > > in EDK Menu -> Platform configuration -> system event log > > > > > > -> > > > > > > IIO error enabling -> OS native AER support. > > > > > > > > > > > > This setting is assigned to VMD bridge by > > > > > > vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without > > > > > > the patch 04b12ef163d1, VMD driver will enable AER even if > > > > > > platform has disabled OS native AER support as mentioned > > > > > > earlier. This will result in an AER flood mentioned in > > > > > > 04b12ef163d1 since there is no AER handlers. > > > > > > I missed this before. What does "there are no AER handlers" > > > mean? > > > Do you mean there are no *firmware* AER handlers? > > > > Sorry for confusing wordings. Your understanding is correct. The > > patch > > 04b12ef163d1 is used to disable AER on vmd and avoid the AER > > flooding > > by applying _OSC settings since vmd driver doesn't give a choice to > > toggle AER, DPC, LTR, etc. > > > The dmesg log at > > > https://bugzilla.kernel.org/attachment.cgi?id=299571 > > > (from https://bugzilla.kernel.org/show_bug.cgi?id=215027, the bug > > > fixed by 04b12ef163d1), shows this: > > > > > > acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM > > > Segments MSI HPX-Type3] > > > acpi PNP0A08:00: _OSC: platform does not support [AER] > > > acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug > > > PME > > > PCIeCapability LTR] > > > > > > so the firmware did not grant AER control to the OS (I think > > > "platform does not support" is a confusing description). > > > > > > Prior to 04b12ef163d1, Linux applied _OSC above the VMD bridge > > > but > > > not below it, so Linux had native control below VMD and it did > > > handle AER interrupts from the 10000:e0:06.0 Root Port below VMD: > > > > > > vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0 > > > pcieport 10000:e0:06.0: AER: Corrected error received: > > > 10000:e1:00.0 > > > > > > After 04b12ef163d1, Linux applied _OSC below the VMD bridge as > > > well, > > > so it did not enable or handle any AER interrupts. I suspect the > > > platform didn't handle AER interrupts below VMD either, so those > > > errors were probably just ignored. > > > > > > > > > If VMD is aware of OS native AER support setting, then we > > > > > > will not see AER flooding issue. > > > > > > > > > > > > Do you have any suggestion on how to make VMD driver aware > > > > > > of AER setting and keep it in sync with platform setting. > > > > > > > > > > Well, this is the whole problem. IIUC, you're saying we > > > > > should use _OSC to negotiate for AER control below the VMD > > > > > bridge, but ignore _OSC for hotplug control. > > > > > > > > Because VMD has its own hotplug BIOS setting which allows vmd > > > > to > > > > enable or disable hotplug on its domain. However we don't have > > > > VMD specific AER, DPC, LTR settings. > > > > > > I don't quite follow. The OS knows nothing about whether BIOS > > > settings exist, so they can't be used to determine where _OSC > > > applies. > > > > > > > Is it okay if we include an additional check for hotplug? i.e. > > > > Hotplug capable bit in SltCap register which reflects VMD BIOS > > > > hotplug setting. > > > > > > I don't know what check you have in mind, but the OS can > > > definitely use SltCap to decide what features to enable. > > > > Yes I agree. That is what I am also suggesting in this patch. > > I should have said "the OS can use SltCap to *help* decide what > features to enable." For ACPI host bridges, _OSC determines whether > hotplug should be operated by the platform or the OS. > > I think we're converging on the idea that since VMD is effectively > *not* an ACPI host bridge and doesn't have its own _OSC, the _OSC > that > applies to the VMD endpoint does *not* apply to the hierarchy below > the VMD. In that case, the default is that the OS owns all the > features (hotplug, AER, etc) below the VMD. Well there will be few problems if VMD owns/assigns all the flags by itself. We discussed all of the potential problems but due to the holidays I think I should summarize them again. #1 : Currently there is no way to pass the information about AER, DPC, etc to VMD driver from BIOS or from boot parameter. For example, If VMD blindly enables AER and platform has AER disabled, then we will see AERs from devices under VMD but user have no way to toggle it since we rejected idea of adding boot parameter for AER, DPC under VMD. I believe you also didn't like the idea of sysfs knob suggested by Kai- Heng. #2 : Even if we use VMD hardware register to store AER, DPC and make UEFI VMD driver to write information about Hotplug, DPC, AER, we still dont have a way to change the setting if user wants to alter them. Also the issue will still persist in client platforms since we don't own their UEFI VMD driver. It will be a huge effort. #3 : At this moment, user can enable/disable only Hotplug in VMD BIOS settings (meaning no AER, DPC, LTR, etc)and VMD driver can read it from SltCap register. This means BIOS needs to add other settings and VMD needs to figure out to read them which at this moment VMD can't do it. #4 : consistency with Host OS and Guest OS. I believe the current propesed patch is the best option which requires minimal changes without breaking other platform features and unblock our customers. This issues has been a blocker for us. For your concerns regarding how VMD can own all the _OSC features, i am open to other ideas and will discuss with the architect if they have any suggestions. > > > > You suggest above that you want vmd to be aware of AER ownership > > > per > > > _OSC, but it sounds more like you just want AER disabled below > > > VMD > > > regardless. Or do you have platforms that can actually handle > > > AER > > > interrupts from Root Ports below VMD? > > > > No I dont want AER disabled below VMD all the time. I am suggesting > > vmd should be in sync with all the _OSC flags since vmd doesn't > > give > > a choice to toggle. > > This is what I don't understand. If we decide that _OSC doesn't > apply > below VMD because the VMD host bridge isn't described in ACPI, the > idea of being in sync with _OSC doesn't make sense. > > > However, the issue arises in guest where _OSC setting for hotplug > > is > > always 0 even though hotplug is 1 in host and hotplug enable bit in > > SltCap register is 1 in both host and guest. So we can use that to > > enable hotplug in guest. Like you suggested in your above comment. > > All this got paged out over the holidays, so I need to refresh my > understanding here. Maybe it will help if we can make the situation > more concrete. I'm basing this on the logs at > https://bugzilla.kernel.org/show_bug.cgi?id=215027. I assume the > 10000:e0:06.0 Root Port and the 10000:e1:00.0 NVMe device are both > passed through to the guest. I'm sure I got lots wrong here, so > please correct me :) > > Host OS sees: > > PNP0A08 host bridge to 0000 [bus 00-ff] > _OSC applies to domain 0000 > OS owns [PCIeHotplug SHPCHotplug PME PCIeCapability LTR] in > domain 0000 > vmd 0000:00:0e.0: PCI host bridge to domain 10000 [bus e0-ff] > no _OSC applies in domain 10000; > host OS owns all PCIe features in domain 10000 > pci 10000:e0:06.0: [8086:464d] # VMD root port > pci 10000:e0:06.0: PCI bridge to [bus e1] > pci 10000:e0:06.0: SltCap: HotPlug+ # Hotplug Capable > pci 10000:e1:00.0: [144d:a80a] # nvme > > Guest OS sees: > > PNP0A03 host bridge to 0000 [bus 00-ff] > _OSC applies to domain 0000 > platform owns [PCIeHotplug ...] # _OSC doesn't grant > to OS > pci 0000:e0:06.0: [8086:464d] # VMD root port > pci 0000:e0:06.0: PCI bridge to [bus e1] > pci 0000:e0:06.0: SltCap: HotPlug+ # Hotplug Capable > pci 0000:e1:00.0: [144d:a80a] # nvme > > So the guest OS sees that the VMD Root Port supports hotplug, but it > can't use it because it was not granted ownership of the feature? You are correct about _OSC not granting access in Guest. This is what I see on my setup. Host OS: ACPI: PCI Root Bridge [PC11] (domain 0000 [bus e2-fa]) acpi PNP0A08:0b: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] acpi PNP0A08:0b: _OSC: platform does not support [SHPCHotplug AER DPC] acpi PNP0A08:0b: _OSC: OS now controls [PCIeHotplug PME PCIeCapability LTR] PCI host bridge to bus 0000:e2 vmd 0000:e2:00.5: PCI host bridge to bus 10007:00 vmd 0000:e2:00.5: Bound to PCI domain 10007 Guest OS: ACPI: PCI Root Bridge [PC0G] (domain 0000 [bus 03]) acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] acpi PNP0A08:01: _OSC: platform does not support [PCIeHotplug SHPCHotplug PME AER LTR DPC] acpi PNP0A08:01: _OSC: OS now controls [PCIeCapability] vmd 0000:03:00.0: Bound to PCI domain 10000 SltCap: PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- > > Bjorn
On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote: > ... > In guest, vmd is passthrough including pci topology behind vmd. ... IIUC this says the VMD bridge itself is passed through to the guest, as well as the VMD Root Ports and devices below them. Why is the VMD bridge itself passed through to the guest? I assume the host OS vmd driver enables the VMD bridge functionality, i.e., in vmd_enable_domain()? I gleaned this from the dmesg logs at https://bugzilla.kernel.org/show_bug.cgi?id=215027. I assume these are from the host: pci 0000:00:0e.0: [8086:467f] type 00 # VMD Endpoint vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0 pci_bus 10000:e0: root bus resource [bus e0-ff] pci 10000:e0:06.0: [8086:464d] type 01 # VMD Root Port pci 10000:e0:06.0: PCI bridge to [bus e1] pci 10000:e1:00.0: [144d:a80a] type 00 # NVMe nvme 10000:e1:00.0: PCI INT A: not connected What does it look like in the guest? Does the guest see all three of these devices (VMD Endpoint, VMD Root Port, NVMe)? If the guest sees the VMD Endpoint, what does it do with it? Does the host vmd driver coordinate with the guest vmd driver? Bjorn
On Tue, Jan 16, 2024 at 01:37:32PM -0700, Nirmal Patel wrote: > On Fri, 2024-01-12 at 16:55 -0600, Bjorn Helgaas wrote: > ... > > Maybe it will help if we can make the > > situation more concrete. I'm basing this on the logs at > > https://bugzilla.kernel.org/show_bug.cgi?id=215027. I assume the > > 10000:e0:06.0 Root Port and the 10000:e1:00.0 NVMe device are both > > passed through to the guest. I'm sure I got lots wrong here, so > > please correct me :) > > > > Host OS sees: > > > > PNP0A08 host bridge to 0000 [bus 00-ff] > > _OSC applies to domain 0000 > > OS owns [PCIeHotplug SHPCHotplug PME PCIeCapability LTR] in > > domain 0000 > > vmd 0000:00:0e.0: PCI host bridge to domain 10000 [bus e0-ff] > > no _OSC applies in domain 10000; > > host OS owns all PCIe features in domain 10000 > > pci 10000:e0:06.0: [8086:464d] # VMD root port > > pci 10000:e0:06.0: PCI bridge to [bus e1] > > pci 10000:e0:06.0: SltCap: HotPlug+ # Hotplug Capable > > pci 10000:e1:00.0: [144d:a80a] # nvme > > > > Guest OS sees: > > > > PNP0A03 host bridge to 0000 [bus 00-ff] > > _OSC applies to domain 0000 > > platform owns [PCIeHotplug ...] # _OSC doesn't grant > > to OS > > pci 0000:e0:06.0: [8086:464d] # VMD root port > > pci 0000:e0:06.0: PCI bridge to [bus e1] > > pci 0000:e0:06.0: SltCap: HotPlug+ # Hotplug Capable > > pci 0000:e1:00.0: [144d:a80a] # nvme > > > > So the guest OS sees that the VMD Root Port supports hotplug, but > > it can't use it because it was not granted ownership of the > > feature? > > You are correct about _OSC not granting access in Guest. I was assuming the VMD Endpoint itself was not visible in the guest and the VMD Root Ports appeared in domain 0000 described by the guest PNP0A03. The PNP0A03 _OSC would then apply to the VMD Root Ports. But my assumption appears to be wrong: > This is what I see on my setup. > > Host OS: > > ACPI: PCI Root Bridge [PC11] (domain 0000 [bus e2-fa]) > acpi PNP0A08:0b: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] > acpi PNP0A08:0b: _OSC: platform does not support [SHPCHotplug AER DPC] > acpi PNP0A08:0b: _OSC: OS now controls [PCIeHotplug PME PCIeCapability LTR] > PCI host bridge to bus 0000:e2 > > vmd 0000:e2:00.5: PCI host bridge to bus 10007:00 > vmd 0000:e2:00.5: Bound to PCI domain 10007 > > Guest OS: > > ACPI: PCI Root Bridge [PC0G] (domain 0000 [bus 03]) > acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] > acpi PNP0A08:01: _OSC: platform does not support [PCIeHotplug SHPCHotplug PME AER LTR DPC] > acpi PNP0A08:01: _OSC: OS now controls [PCIeCapability] > > vmd 0000:03:00.0: Bound to PCI domain 10000 > > SltCap: PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Your example above suggests that the guest has a PNP0A08 device for domain 0000, with an _OSC, the guest sees the VMD Endpoint at 0000:03:00.0, and the VMD Root Ports and devices below them are in domain 10000. Right? If we decide the _OSC that covers the VMD Endpoint does *not* apply to the domain below the VMD bridge, the guest has no _OSC for domain 10000, so the guest OS should default to owning all the PCIe features in that domain. Bjorn
On Tue, 2024-01-16 at 13:37 -0700, Nirmal Patel wrote: > On Fri, 2024-01-12 at 16:55 -0600, Bjorn Helgaas wrote: > > [+cc Rafael, Kai-Heng] > > > > On Thu, Dec 14, 2023 at 03:22:21PM -0700, Nirmal Patel wrote: > > > On Thu, 2023-12-14 at 13:23 -0600, Bjorn Helgaas wrote: > > > > On Wed, Dec 13, 2023 at 06:07:02PM -0700, Nirmal Patel wrote: > > > > > On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote: > > > > > > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel > > > > > > wrote: > > > > > > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote: > > > > > > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel > > > > > > > > wrote: > > > > > > > > ... > > > > > > > > > Currently Hotplug is controlled by _OSC in both host > > > > > > > > > and > > > > > > > > > Guest. In case of guest, it seems guest BIOS hasn't > > > > > > > > > implemented _OSC since all the flags are set to 0 > > > > > > > > > which > > > > > > > > > is not the case in host. > > > > > > > > > > > > > > > > I think you want pciehp to work on the VMD Root Ports > > > > > > > > in > > > > > > > > the Guest, no matter what, on the assumption that any > > > > > > > > _OSC > > > > > > > > that applies to host bridge A does not apply to the > > > > > > > > host > > > > > > > > bridge created by the vmd driver. > > > > > > > > > > > > > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: > > > > > > > > Honor ACPI _OSC on PCIe features"). Since host bridge > > > > > > > > B > > > > > > > > was not enumerated via ACPI, the OS owns all those > > > > > > > > features under B by default, so reverting 04b12ef163d1 > > > > > > > > would leave that state. > > > > > > > > > > > > > > > > Obviously we'd have to first figure out another > > > > > > > > solution > > > > > > > > for the AER message flood that 04b12ef163d1 resolved. > > > > > > > > > > > > > > If we revert 04b12ef163d1, then VMD driver will still > > > > > > > enable > > > > > > > AER blindly which is a bug. So we need to find a way to > > > > > > > make > > > > > > > VMD driver aware of AER platform setting and use that > > > > > > > information to enable or disable AER for its child > > > > > > > devices. > > > > > > > > > > > > > > There is a setting in BIOS that allows us to enable OS > > > > > > > native AER support on the platform. This setting is > > > > > > > located > > > > > > > in EDK Menu -> Platform configuration -> system event log > > > > > > > -> > > > > > > > IIO error enabling -> OS native AER support. > > > > > > > > > > > > > > This setting is assigned to VMD bridge by > > > > > > > vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without > > > > > > > the patch 04b12ef163d1, VMD driver will enable AER even > > > > > > > if > > > > > > > platform has disabled OS native AER support as mentioned > > > > > > > earlier. This will result in an AER flood mentioned in > > > > > > > 04b12ef163d1 since there is no AER handlers. > > > > > > > > I missed this before. What does "there are no AER handlers" > > > > mean? > > > > Do you mean there are no *firmware* AER handlers? > > > > > > Sorry for confusing wordings. Your understanding is correct. The > > > patch > > > 04b12ef163d1 is used to disable AER on vmd and avoid the AER > > > flooding > > > by applying _OSC settings since vmd driver doesn't give a choice > > > to > > > toggle AER, DPC, LTR, etc. > > > > The dmesg log at > > > > https://bugzilla.kernel.org/attachment.cgi?id=299571 > > > > (from https://bugzilla.kernel.org/show_bug.cgi?id=215027, the > > > > bug > > > > fixed by 04b12ef163d1), shows this: > > > > > > > > acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM > > > > ClockPM > > > > Segments MSI HPX-Type3] > > > > acpi PNP0A08:00: _OSC: platform does not support [AER] > > > > acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug > > > > SHPCHotplug > > > > PME > > > > PCIeCapability LTR] > > > > > > > > so the firmware did not grant AER control to the OS (I think > > > > "platform does not support" is a confusing description). > > > > > > > > Prior to 04b12ef163d1, Linux applied _OSC above the VMD bridge > > > > but > > > > not below it, so Linux had native control below VMD and it did > > > > handle AER interrupts from the 10000:e0:06.0 Root Port below > > > > VMD: > > > > > > > > vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0 > > > > pcieport 10000:e0:06.0: AER: Corrected error received: > > > > 10000:e1:00.0 > > > > > > > > After 04b12ef163d1, Linux applied _OSC below the VMD bridge as > > > > well, > > > > so it did not enable or handle any AER interrupts. I suspect > > > > the > > > > platform didn't handle AER interrupts below VMD either, so > > > > those > > > > errors were probably just ignored. > > > > > > > > > > > If VMD is aware of OS native AER support setting, then we > > > > > > > will not see AER flooding issue. > > > > > > > > > > > > > > Do you have any suggestion on how to make VMD driver > > > > > > > aware > > > > > > > of AER setting and keep it in sync with platform setting. > > > > > > > > > > > > Well, this is the whole problem. IIUC, you're saying we > > > > > > should use _OSC to negotiate for AER control below the VMD > > > > > > bridge, but ignore _OSC for hotplug control. > > > > > > > > > > Because VMD has its own hotplug BIOS setting which allows vmd > > > > > to > > > > > enable or disable hotplug on its domain. However we don't > > > > > have > > > > > VMD specific AER, DPC, LTR settings. > > > > > > > > I don't quite follow. The OS knows nothing about whether BIOS > > > > settings exist, so they can't be used to determine where _OSC > > > > applies. > > > > > > > > > Is it okay if we include an additional check for hotplug? > > > > > i.e. > > > > > Hotplug capable bit in SltCap register which reflects VMD > > > > > BIOS > > > > > hotplug setting. > > > > > > > > I don't know what check you have in mind, but the OS can > > > > definitely use SltCap to decide what features to enable. > > > > > > Yes I agree. That is what I am also suggesting in this patch. > > > > I should have said "the OS can use SltCap to *help* decide what > > features to enable." For ACPI host bridges, _OSC determines > > whether > > hotplug should be operated by the platform or the OS. > > > > I think we're converging on the idea that since VMD is effectively > > *not* an ACPI host bridge and doesn't have its own _OSC, the _OSC > > that > > applies to the VMD endpoint does *not* apply to the hierarchy below > > the VMD. In that case, the default is that the OS owns all the > > features (hotplug, AER, etc) below the VMD. > Well there will be few problems if VMD owns/assigns all the flags by > itself. We discussed all of the potential problems but due to the > holidays I think I should summarize them again. > #1 : Currently there is no way to pass the information about AER, > DPC, > etc to VMD driver from BIOS or from boot parameter. For example, If > VMD > blindly enables AER and platform has AER disabled, then we will see > AERs from devices under VMD but user have no way to toggle it since > we > rejected idea of adding boot parameter for AER, DPC under VMD. I > believe you also didn't like the idea of sysfs knob suggested by Kai- > Heng. > > #2 : Even if we use VMD hardware register to store AER, DPC and make > UEFI VMD driver to write information about Hotplug, DPC, AER, we > still > dont have a way to change the setting if user wants to alter them. > Also > the issue will still persist in client platforms since we don't own > their UEFI VMD driver. It will be a huge effort. > > #3 : At this moment, user can enable/disable only Hotplug in VMD BIOS > settings (meaning no AER, DPC, LTR, etc)and VMD driver can read it > from > SltCap register. This means BIOS needs to add other settings and VMD > needs to figure out to read them which at this moment VMD can't do > it. > > #4 : consistency with Host OS and Guest OS. > > I believe the current propesed patch is the best option which > requires > minimal changes without breaking other platform features and unblock > our customers. This issues has been a blocker for us. > > For your concerns regarding how VMD can own all the _OSC features, i > am > open to other ideas and will discuss with the architect if they have > any suggestions. > > > > > > You suggest above that you want vmd to be aware of AER > > > > ownership > > > > per > > > > _OSC, but it sounds more like you just want AER disabled below > > > > VMD > > > > regardless. Or do you have platforms that can actually handle > > > > AER > > > > interrupts from Root Ports below VMD? > > > > > > No I dont want AER disabled below VMD all the time. I am > > > suggesting > > > vmd should be in sync with all the _OSC flags since vmd doesn't > > > give > > > a choice to toggle. > > > > This is what I don't understand. If we decide that _OSC doesn't > > apply > > below VMD because the VMD host bridge isn't described in ACPI, the > > idea of being in sync with _OSC doesn't make sense. > > > > > However, the issue arises in guest where _OSC setting for hotplug > > > is > > > always 0 even though hotplug is 1 in host and hotplug enable bit > > > in > > > SltCap register is 1 in both host and guest. So we can use that > > > to > > > enable hotplug in guest. Like you suggested in your above > > > comment. > > > > All this got paged out over the holidays, so I need to refresh my > > understanding here. Maybe it will help if we can make the > > situation > > more concrete. I'm basing this on the logs at > > https://bugzilla.kernel.org/show_bug.cgi?id=215027. I assume the > > 10000:e0:06.0 Root Port and the 10000:e1:00.0 NVMe device are both > > passed through to the guest. I'm sure I got lots wrong here, so > > please correct me :) > > > > Host OS sees: > > > > PNP0A08 host bridge to 0000 [bus 00-ff] > > _OSC applies to domain 0000 > > OS owns [PCIeHotplug SHPCHotplug PME PCIeCapability LTR] in > > domain 0000 > > vmd 0000:00:0e.0: PCI host bridge to domain 10000 [bus e0-ff] > > no _OSC applies in domain 10000; > > host OS owns all PCIe features in domain 10000 > > pci 10000:e0:06.0: [8086:464d] # VMD root port > > pci 10000:e0:06.0: PCI bridge to [bus e1] > > pci 10000:e0:06.0: SltCap: HotPlug+ # Hotplug Capable > > pci 10000:e1:00.0: [144d:a80a] # nvme > > > > Guest OS sees: > > > > PNP0A03 host bridge to 0000 [bus 00-ff] > > _OSC applies to domain 0000 > > platform owns [PCIeHotplug ...] # _OSC doesn't grant > > to OS > > pci 0000:e0:06.0: [8086:464d] # VMD root port > > pci 0000:e0:06.0: PCI bridge to [bus e1] > > pci 0000:e0:06.0: SltCap: HotPlug+ # Hotplug Capable > > pci 0000:e1:00.0: [144d:a80a] # nvme > > > > So the guest OS sees that the VMD Root Port supports hotplug, but > > it > > can't use it because it was not granted ownership of the feature? > You are correct about _OSC not granting access in Guest. This is what > I > see on my setup. > > Host OS: > > ACPI: PCI Root Bridge [PC11] (domain 0000 [bus e2-fa]) > acpi PNP0A08:0b: _OSC: OS supports [ExtendedConfig ASPM ClockPM > Segments MSI EDR HPX-Type3] > acpi PNP0A08:0b: _OSC: platform does not support [SHPCHotplug > AER DPC] > acpi PNP0A08:0b: _OSC: OS now controls [PCIeHotplug PME > PCIeCapability LTR] > PCI host bridge to bus 0000:e2 > > vmd 0000:e2:00.5: PCI host bridge to bus 10007:00 > vmd 0000:e2:00.5: Bound to PCI domain 10007 > > Guest OS: > > ACPI: PCI Root Bridge [PC0G] (domain 0000 [bus 03]) > acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM > Segments MSI EDR HPX-Type3] > acpi PNP0A08:01: _OSC: platform does not support [PCIeHotplug > SHPCHotplug PME AER LTR DPC] > acpi PNP0A08:01: _OSC: OS now controls [PCIeCapability] > > vmd 0000:03:00.0: Bound to PCI domain 10000 > > SltCap: PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- > > > > > Bjorn Hi Bjorn, Lorenzo, Gentle ping. Thanks.
On Tue, Jan 16, 2024 at 06:49:33PM -0600, Bjorn Helgaas wrote: > On Tue, Jan 16, 2024 at 01:37:32PM -0700, Nirmal Patel wrote: > > On Fri, 2024-01-12 at 16:55 -0600, Bjorn Helgaas wrote: > > ... > > > Maybe it will help if we can make the > > > situation more concrete. I'm basing this on the logs at > > > https://bugzilla.kernel.org/show_bug.cgi?id=215027. I assume the > > > 10000:e0:06.0 Root Port and the 10000:e1:00.0 NVMe device are both > > > passed through to the guest. I'm sure I got lots wrong here, so > > > please correct me :) > > > > > > Host OS sees: > > > > > > PNP0A08 host bridge to 0000 [bus 00-ff] > > > _OSC applies to domain 0000 > > > OS owns [PCIeHotplug SHPCHotplug PME PCIeCapability LTR] in > > > domain 0000 > > > vmd 0000:00:0e.0: PCI host bridge to domain 10000 [bus e0-ff] > > > no _OSC applies in domain 10000; > > > host OS owns all PCIe features in domain 10000 > > > pci 10000:e0:06.0: [8086:464d] # VMD root port > > > pci 10000:e0:06.0: PCI bridge to [bus e1] > > > pci 10000:e0:06.0: SltCap: HotPlug+ # Hotplug Capable > > > pci 10000:e1:00.0: [144d:a80a] # nvme > > > > > > Guest OS sees: > > > > > > PNP0A03 host bridge to 0000 [bus 00-ff] > > > _OSC applies to domain 0000 > > > platform owns [PCIeHotplug ...] # _OSC doesn't grant > > > to OS > > > pci 0000:e0:06.0: [8086:464d] # VMD root port > > > pci 0000:e0:06.0: PCI bridge to [bus e1] > > > pci 0000:e0:06.0: SltCap: HotPlug+ # Hotplug Capable > > > pci 0000:e1:00.0: [144d:a80a] # nvme > > > > > > So the guest OS sees that the VMD Root Port supports hotplug, but > > > it can't use it because it was not granted ownership of the > > > feature? > > > > You are correct about _OSC not granting access in Guest. > > I was assuming the VMD Endpoint itself was not visible in the guest > and the VMD Root Ports appeared in domain 0000 described by the guest > PNP0A03. The PNP0A03 _OSC would then apply to the VMD Root Ports. > But my assumption appears to be wrong: > > > This is what I see on my setup. > > > > Host OS: > > > > ACPI: PCI Root Bridge [PC11] (domain 0000 [bus e2-fa]) > > acpi PNP0A08:0b: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] > > acpi PNP0A08:0b: _OSC: platform does not support [SHPCHotplug AER DPC] > > acpi PNP0A08:0b: _OSC: OS now controls [PCIeHotplug PME PCIeCapability LTR] > > PCI host bridge to bus 0000:e2 > > > > vmd 0000:e2:00.5: PCI host bridge to bus 10007:00 > > vmd 0000:e2:00.5: Bound to PCI domain 10007 > > > > Guest OS: > > > > ACPI: PCI Root Bridge [PC0G] (domain 0000 [bus 03]) > > acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] > > acpi PNP0A08:01: _OSC: platform does not support [PCIeHotplug SHPCHotplug PME AER LTR DPC] > > acpi PNP0A08:01: _OSC: OS now controls [PCIeCapability] > > > > vmd 0000:03:00.0: Bound to PCI domain 10000 > > > > SltCap: PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- > > Your example above suggests that the guest has a PNP0A08 device for > domain 0000, with an _OSC, the guest sees the VMD Endpoint at > 0000:03:00.0, and the VMD Root Ports and devices below them are in > domain 10000. Right? Is my assumption here correct? Do the VMD Endpoint, the VMD Root Ports, and the devices below those Root Ports all visible to the guest? If so, it sounds like the VMD Endpoint is visible to both the host and the guest? I assume you only want it claimed by the vmd driver in *one* of them? If it's visible in both, how do you prevent one from claiming it? > If we decide the _OSC that covers the VMD Endpoint does *not* apply to > the domain below the VMD bridge, the guest has no _OSC for domain > 10000, so the guest OS should default to owning all the PCIe features > in that domain. I assume the guest sees the VMD Endpoint in domain 0000, right? Does the guest have an _OSC for domain 0000? If so, that would clearly apply to the VMD Endpoint. I think the guest sees the VMD Root Ports in domain 10000, along with the devices below them, right? I'm pretty sure the guest has no _OSC for domain 10000, i.e., for the the VMD Root Ports and below, right? Bjorn
On Tue, Jan 16, 2024 at 01:37:32PM -0700, Nirmal Patel wrote: > On Fri, 2024-01-12 at 16:55 -0600, Bjorn Helgaas wrote: > ... > > I think we're converging on the idea that since VMD is effectively > > *not* an ACPI host bridge and doesn't have its own _OSC, the _OSC > > that applies to the VMD endpoint does *not* apply to the hierarchy > > below the VMD. In that case, the default is that the OS owns all > > the features (hotplug, AER, etc) below the VMD. > > Well there will be few problems if VMD owns/assigns all the flags by > itself. We discussed all of the potential problems but due to the > holidays I think I should summarize them again. > > #1 : Currently there is no way to pass the information about AER, > DPC, etc to VMD driver from BIOS or from boot parameter. For > example, if VMD blindly enables AER and platform has AER disabled, > then we will see AERs from devices under VMD but user have no way to > toggle it since we rejected idea of adding boot parameter for AER, > DPC under VMD. I believe you also didn't like the idea of sysfs knob > suggested by Kai-Heng. > > #2 : Even if we use VMD hardware register to store AER, DPC and make > UEFI VMD driver to write information about Hotplug, DPC, AER, we > still dont have a way to change the setting if user wants to alter > them. Also the issue will still persist in client platforms since we > don't own their UEFI VMD driver. It will be a huge effort. > > #3 : At this moment, user can enable/disable only Hotplug in VMD > BIOS settings (meaning no AER, DPC, LTR, etc)and VMD driver can read > it from SltCap register. This means BIOS needs to add other settings > and VMD needs to figure out to read them which at this moment VMD > can't do it. > > #4 : consistency with Host OS and Guest OS. > > I believe the current proposed patch is the best option which > requires minimal changes without breaking other platform features > and unblock our customers. This issues has been a blocker for us. > > For your concerns regarding how VMD can own all the _OSC features, i > am open to other ideas and will discuss with the architect if they > have any suggestions. As I understand it, the basic model of PCIe features is: - If platform doesn't have a way to negotiate ownership of PCIe features, the OS owns them by default, e.g., on non-ACPI systems. - If platform does have a way to negotiate, e.g., ACPI _OSC, the platform owns the features until it grants ownership to the OS. - If the OS has ownership (either by default or granted by platform), it can use the feature if the hardware device advertises support. I think this model applies to all PCIe features, including hotplug, AER, DPC, etc., and the OS uses _OSC and the Capabilities in device config space to determine whether to use the features. _OSC is the obvious way for a platform to use BIOS settings to influence what the OS does. I think the problem with VMD is that you have a guest OS running on a platform (the guest VM) where you want a host BIOS setting to control things in that guest platform, but there's currently no way to influence the _OSC seen by the guest. I think we need to: - Clarify whether _OSC only applies in the domain of the host bridge where it appears, i.e., an _OSC in a host bridge to domain 0000 obviously applies to a VMD Endpoint in domain 0000; does it also apply to devices in the domain 10000 *below* the VMD Endpoint? - Describe what the VMD bridge does with events below it. E.g., normally a Root Port that receives an error Message generates an interrupt depending on its Interrupt Disable and Error Reporting Enable bits. What happens when it's a VMD Root Port? Does it forward an error Message up via the VMD Endpoint? Generate an interrupt via the VMD Endpoint? If so, which interrupt? The question of where _OSC applies feels like an architectural thing. The question of how AER, DPC, hotplug, etc. events are forwarded across the VMD Root Port/Endpoint might be, too, or maybe that's all driver-specific, I dunno. Either way, I think it's worth documenting somehow. Bjorn
On Thu, Feb 01, 2024 at 11:38:47AM -0700, Nirmal Patel wrote:
> Gentle ping.
Thanks for the ping, I was waiting for you, so we were deadlocked ;)
On Thu, 2024-02-01 at 17:00 -0600, Bjorn Helgaas wrote: > On Thu, Feb 01, 2024 at 11:38:47AM -0700, Nirmal Patel wrote: > > Gentle ping. > > Thanks for the ping, I was waiting for you, so we were deadlocked ;) Hi Bjorn, Sorry I missed that. Did you have a chance to look at my response on January 16th to your questions? I tried to summarize all of the potential problems and issues with different fixes. Please let me know if it is easier if I resend the explanation. Thanks. -nirmal
On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel wrote: > ... > Did you have a chance to look at my response on January 16th to your > questions? I tried to summarize all of the potential problems and > issues with different fixes. Please let me know if it is easier if I > resend the explanation. Thanks. I did see your Jan 16 response, thanks. I had more questions after reading it, but they're more about understanding the topology seen by the host and the guest: Jan 16: https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas Feb 1: https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas As I mentioned in my second Feb 1 response (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas), the usual plan envisioned by the PCI Firmware spec is that an OS can use a PCIe feature if the platform has granted the OS ownership via _OSC and a device advertises the feature via a Capability in config space. My main concern with the v2 patch (https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com) is that it overrides _OSC for native_pcie_hotplug, but not for any of the other features (AER, PME, LTR, DPC, etc.) I think it's hard to read the specs and conclude that PCIe hotplug is a special case, and I think we're likely to have similar issues with other features in the future. But if you think this is the best solution, I'm OK with merging it. Bjorn
On Wed, 2024-02-07 at 12:55 -0600, Bjorn Helgaas wrote: > On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel wrote: > > ... > > Did you have a chance to look at my response on January 16th to > > your > > questions? I tried to summarize all of the potential problems and > > issues with different fixes. Please let me know if it is easier if > > I > > resend the explanation. Thanks. > > I did see your Jan 16 response, thanks. > > I had more questions after reading it, but they're more about > understanding the topology seen by the host and the guest: > Jan 16: https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas > Feb 1: https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas > > As I mentioned in my second Feb 1 response > (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas), the > usual plan envisioned by the PCI Firmware spec is that an OS can use > a > PCIe feature if the platform has granted the OS ownership via _OSC > and > a device advertises the feature via a Capability in config space. > > My main concern with the v2 patch > ( > https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com > ) > is that it overrides _OSC for native_pcie_hotplug, but not for any of > the other features (AER, PME, LTR, DPC, etc.) > > I think it's hard to read the specs and conclude that PCIe hotplug is > a special case, and I think we're likely to have similar issues with > other features in the future. > > But if you think this is the best solution, I'm OK with merging it. I sincerely apologize for late responses. I just found out that my evolution mail client is automatically sending linux-pci emails to junk since January 2024. At the moment Hotplug is an exception for us, but I do share your concern about other flags. We have done lot of testing and so far patch v2 is the best solution we have. Thanks nirmal > > Bjorn
On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote: [...] > > In the host OS, this overrides whatever was negotiated via _OSC, I > > guess on the principle that _OSC doesn't apply because the host BIOS > > doesn't know about the Root Ports below the VMD. (I'm not sure it's > > 100% resolved that _OSC doesn't apply to them, so we should mention > > that assumption here.) > _OSC still controls every flag including Hotplug. I have observed that > slot capabilities register has hotplug enabled only when platform has > enabled the hotplug. So technically not overriding it in the host. > It overrides in guest because _OSC is passing wrong/different > information than the _OSC information in Host. > Also like I mentioned, slot capabilities registers are not changed in > Guest because vmd is passthrough. > > > > In the guest OS, this still overrides whatever was negotiated via > > _OSC, but presumably the guest BIOS *does* know about the Root Ports, > > so I don't think the same principle about overriding _OSC applies > > there. > > > > I think it's still a problem that we handle > > host_bridge->native_pcie_hotplug differently than all the other > > features negotiated via _OSC. We should at least explain this > > somehow. > Right now this is the only way to know in Guest OS if platform has > enabled Hotplug or not. We have many customers complaining about > Hotplug being broken in Guest. So just trying to honor _OSC but also > fixing its limitation in Guest. The question is: should _OSC settings apply to the VMD hierarchy ? Yes or no (not maybe) ? If yes, the guest firmware is buggy (and I appreciate it is hard to fix). If no this patch should also change vmd_copy_host_bridge_flags() and remove the native_pcie_hotplug initialization from the root bridge. As a matter of fact this patch overrides the _OSC settings in host and guest and it is not acceptable because it is not based on any documented policy (if there is a policy please describe it). > > I suspect part of the reason for doing it differently is to avoid the > > interrupt storm that we avoid via 04b12ef163d1 ("PCI: vmd: Honor ACPI > > _OSC on PCIe features"). If so, I think 04b12ef163d1 should be > > mentioned here because it's an important piece of the picture. > I can add a note about this patch as well. Let me know if you want me > to add something specific. At the very least you need to explain the problem you are solving in the commit log - sentences like: "This patch will make sure that Hotplug is enabled properly in Host as well as in VM while honoring _OSC settings as well as VMD hotplug setting." aren't useful to someone who will look into this in the future. If _OSC does not grant the host kernel HP handling and *any* (that's another choice that should be explained) slot capability reports that the VMD root port is HP capable we do override _OSC, we are not honouring it, AFAICS. Then we can say that in your user experience the stars always align and what I mention above is not a problem because it can't happen but it is hard to grok by just reading the code and more importantly, it is not based on any standard documentation. > Thank you for taking the time. This is a very critical issue for us and > we have many customers complaining about it, I would appreciate to get > it resolved as soon as possible. Sure but we need to solve it in a way that does not break tomorrow otherwise we will keep applying plasters on top of plasters. Ignoring _OSC hotplug settings for VMD bridges in *both* host and guests may be an acceptable - though a tad controversial - policy (if based on software/hardware guidelines). A mixture thereof in my opinion is not acceptable. Thanks, Lorenzo
On Tue, 2024-02-13 at 10:47 -0700, Nirmal Patel wrote: > On Wed, 2024-02-07 at 12:55 -0600, Bjorn Helgaas wrote: > > On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel wrote: > > > ... > > > Did you have a chance to look at my response on January 16th to > > > your > > > questions? I tried to summarize all of the potential problems and > > > issues with different fixes. Please let me know if it is easier > > > if > > > I > > > resend the explanation. Thanks. > > > > I did see your Jan 16 response, thanks. > > > > I had more questions after reading it, but they're more about > > understanding the topology seen by the host and the guest: > > Jan 16: > > https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas > > Feb 1: > > https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas > > > > As I mentioned in my second Feb 1 response > > (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas), the > > usual plan envisioned by the PCI Firmware spec is that an OS can > > use > > a > > PCIe feature if the platform has granted the OS ownership via _OSC > > and > > a device advertises the feature via a Capability in config space. > > > > My main concern with the v2 patch > > ( > > https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com > > ) > > is that it overrides _OSC for native_pcie_hotplug, but not for any > > of > > the other features (AER, PME, LTR, DPC, etc.) > > > > I think it's hard to read the specs and conclude that PCIe hotplug > > is > > a special case, and I think we're likely to have similar issues > > with > > other features in the future. > > > > But if you think this is the best solution, I'm OK with merging it. Hi Bjorn, We tried some other ways to pass the information about all of the flags but I couldn't retrieve it in guest OS and VMD hardware doesn't have empty registers to write this information. So as of now we have this solution which only overwrites Hotplug flag. If you can accept it that would be great. Thanks nirmal. > I sincerely apologize for late responses. I just found out that my > evolution mail client is automatically sending linux-pci emails to > junk > since January 2024. > > At the moment Hotplug is an exception for us, but I do share your > concern about other flags. We have done lot of testing and so far > patch > v2 is the best solution we have. > > Thanks > nirmal > > Bjorn
Hi Nirmal, On Thu, Mar 7, 2024 at 6:20 AM Nirmal Patel <nirmal.patel@linux.intel.com> wrote: > > On Tue, 2024-02-13 at 10:47 -0700, Nirmal Patel wrote: > > On Wed, 2024-02-07 at 12:55 -0600, Bjorn Helgaas wrote: > > > On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel wrote: > > > > ... > > > > Did you have a chance to look at my response on January 16th to > > > > your > > > > questions? I tried to summarize all of the potential problems and > > > > issues with different fixes. Please let me know if it is easier > > > > if > > > > I > > > > resend the explanation. Thanks. > > > > > > I did see your Jan 16 response, thanks. > > > > > > I had more questions after reading it, but they're more about > > > understanding the topology seen by the host and the guest: > > > Jan 16: > > > https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas > > > Feb 1: > > > https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas > > > > > > As I mentioned in my second Feb 1 response > > > (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas), the > > > usual plan envisioned by the PCI Firmware spec is that an OS can > > > use > > > a > > > PCIe feature if the platform has granted the OS ownership via _OSC > > > and > > > a device advertises the feature via a Capability in config space. > > > > > > My main concern with the v2 patch > > > ( > > > https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com > > > ) > > > is that it overrides _OSC for native_pcie_hotplug, but not for any > > > of > > > the other features (AER, PME, LTR, DPC, etc.) > > > > > > I think it's hard to read the specs and conclude that PCIe hotplug > > > is > > > a special case, and I think we're likely to have similar issues > > > with > > > other features in the future. > > > > > > But if you think this is the best solution, I'm OK with merging it. > Hi Bjorn, > > We tried some other ways to pass the information about all of the flags > but I couldn't retrieve it in guest OS and VMD hardware doesn't have > empty registers to write this information. So as of now we have this > solution which only overwrites Hotplug flag. If you can accept it that > would be great. My original commit "PCI: vmd: Honor ACPI _OSC on PCIe features" was a logically conclusion based on VMD ports are just apertures. Apparently there are more nuances than that, and people outside of Intel can't possibly know. And yes VMD creates lots of headaches during hardware enablement. So is it possible to document the right way to use _OSC, as Bjorn suggested? Kai-Heng > > In the host OS, this overrides whatever was negotiated via _OSC, I > > guess on the principle that _OSC doesn't apply because the host BIOS > > doesn't know about the Root Ports below the VMD. (I'm not sure it's > > 100% resolved that _OSC doesn't apply to them, so we should mention > > that assumption here.) > _OSC still controls every flag including Hotplug. I have observed that > slot capabilities register has hotplug enabled only when platform has > enabled the hotplug. So technically not overriding it in the host. > It overrides in guest because _OSC is passing wrong/different > information than the _OSC information in Host. > Also like I mentioned, slot capabilities registers are not changed in > Guest because vmd is passthrough. > > > > In the guest OS, this still overrides whatever was negotiated via > > _OSC, but presumably the guest BIOS *does* know about the Root Ports, > > so I don't think the same principle about overriding _OSC applies > > there. > > > > I think it's still a problem that we handle > > host_bridge->native_pcie_hotplug differently than all the other > > features negotiated via _OSC. We should at least explain this > > somehow. > Right now this is the only way to know in Guest OS if platform has > enabled Hotplug or not. We have many customers complaining about > Hotplug being broken in Guest. So just trying to honor _OSC but also > fixing its limitation in Guest. > > Thanks > nirmal. > > > I sincerely apologize for late responses. I just found out that my > > evolution mail client is automatically sending linux-pci emails to > > junk > > since January 2024. > > > > At the moment Hotplug is an exception for us, but I do share your > > concern about other flags. We have done lot of testing and so far > > patch > > v2 is the best solution we have. > > > > Thanks > > nirmal > > > Bjorn >
On Thu, 7 Mar 2024 14:44:23 +0800 Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: > Hi Nirmal, > > On Thu, Mar 7, 2024 at 6:20 AM Nirmal Patel > <nirmal.patel@linux.intel.com> wrote: > > > > On Tue, 2024-02-13 at 10:47 -0700, Nirmal Patel wrote: > > > On Wed, 2024-02-07 at 12:55 -0600, Bjorn Helgaas wrote: > > > > On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel wrote: > > > > > ... > > > > > Did you have a chance to look at my response on January 16th > > > > > to your > > > > > questions? I tried to summarize all of the potential problems > > > > > and issues with different fixes. Please let me know if it is > > > > > easier if > > > > > I > > > > > resend the explanation. Thanks. > > > > > > > > I did see your Jan 16 response, thanks. > > > > > > > > I had more questions after reading it, but they're more about > > > > understanding the topology seen by the host and the guest: > > > > Jan 16: > > > > https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas > > > > Feb 1: > > > > https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas > > > > > > > > As I mentioned in my second Feb 1 response > > > > (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas), > > > > the usual plan envisioned by the PCI Firmware spec is that an > > > > OS can use > > > > a > > > > PCIe feature if the platform has granted the OS ownership via > > > > _OSC and > > > > a device advertises the feature via a Capability in config > > > > space. > > > > > > > > My main concern with the v2 patch > > > > ( > > > > https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com > > > > ) > > > > is that it overrides _OSC for native_pcie_hotplug, but not for > > > > any of > > > > the other features (AER, PME, LTR, DPC, etc.) > > > > > > > > I think it's hard to read the specs and conclude that PCIe > > > > hotplug is > > > > a special case, and I think we're likely to have similar issues > > > > with > > > > other features in the future. > > > > > > > > But if you think this is the best solution, I'm OK with merging > > > > it. > > Hi Bjorn, > > > > We tried some other ways to pass the information about all of the > > flags but I couldn't retrieve it in guest OS and VMD hardware > > doesn't have empty registers to write this information. So as of > > now we have this solution which only overwrites Hotplug flag. If > > you can accept it that would be great. > > My original commit "PCI: vmd: Honor ACPI _OSC on PCIe features" was a > logically conclusion based on VMD ports are just apertures. > Apparently there are more nuances than that, and people outside of > Intel can't possibly know. And yes VMD creates lots of headaches > during hardware enablement. > > So is it possible to document the right way to use _OSC, as Bjorn > suggested? > > Kai-Heng Well we are stuck here with two issues which are kind of chicken and egg problem. 1) VMD respects _OSC; which works excellent in Host but _OSC gives wrong information in Guest OS which ends up disabling Hotplug, AER, DPC, etc. We can live with AER, DPC disabled in VM but not the Hotplug. 2) VMD takes ownership of all the flags. For this to work VMD needs to know these settings from BIOS. VMD hardware doesn't have empty register where VMD UEFI driver can write this information. So honoring user settings for AER, Hotplug, etc from BIOS is not possible. Where do you want me to add documentation? Will more information in the comment section of the Sltcap code change help? Architecture wise VMD must own all of these flags but we have a hardware limitation to successfully pass the necessary information to the Host OS and the Guest OS. Thanks nirmal > > > > In the host OS, this overrides whatever was negotiated via _OSC, I > > > guess on the principle that _OSC doesn't apply because the host > > > BIOS doesn't know about the Root Ports below the VMD. (I'm not > > > sure it's 100% resolved that _OSC doesn't apply to them, so we > > > should mention that assumption here.) > > _OSC still controls every flag including Hotplug. I have observed > > that slot capabilities register has hotplug enabled only when > > platform has enabled the hotplug. So technically not overriding it > > in the host. It overrides in guest because _OSC is passing > > wrong/different information than the _OSC information in Host. > > Also like I mentioned, slot capabilities registers are not changed > > in Guest because vmd is passthrough. > > > > > > In the guest OS, this still overrides whatever was negotiated via > > > _OSC, but presumably the guest BIOS *does* know about the Root > > > Ports, so I don't think the same principle about overriding _OSC > > > applies there. > > > > > > I think it's still a problem that we handle > > > host_bridge->native_pcie_hotplug differently than all the other > > > features negotiated via _OSC. We should at least explain this > > > somehow. > > Right now this is the only way to know in Guest OS if platform has > > enabled Hotplug or not. We have many customers complaining about > > Hotplug being broken in Guest. So just trying to honor _OSC but also > > fixing its limitation in Guest. > > > > Thanks > > nirmal. > > > > > I sincerely apologize for late responses. I just found out that my > > > evolution mail client is automatically sending linux-pci emails to > > > junk > > > since January 2024. > > > > > > At the moment Hotplug is an exception for us, but I do share your > > > concern about other flags. We have done lot of testing and so far > > > patch > > > v2 is the best solution we have. > > > > > > Thanks > > > nirmal > > > > Bjorn > >
On Fri, Mar 8, 2024 at 8:09 AM Nirmal Patel <nirmal.patel@linux.intel.com> wrote: > > On Thu, 7 Mar 2024 14:44:23 +0800 > Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: > > > Hi Nirmal, > > > > On Thu, Mar 7, 2024 at 6:20 AM Nirmal Patel > > <nirmal.patel@linux.intel.com> wrote: > > > > > > On Tue, 2024-02-13 at 10:47 -0700, Nirmal Patel wrote: > > > > On Wed, 2024-02-07 at 12:55 -0600, Bjorn Helgaas wrote: > > > > > On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel wrote: > > > > > > ... > > > > > > Did you have a chance to look at my response on January 16th > > > > > > to your > > > > > > questions? I tried to summarize all of the potential problems > > > > > > and issues with different fixes. Please let me know if it is > > > > > > easier if > > > > > > I > > > > > > resend the explanation. Thanks. > > > > > > > > > > I did see your Jan 16 response, thanks. > > > > > > > > > > I had more questions after reading it, but they're more about > > > > > understanding the topology seen by the host and the guest: > > > > > Jan 16: > > > > > https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas > > > > > Feb 1: > > > > > https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas > > > > > > > > > > As I mentioned in my second Feb 1 response > > > > > (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas), > > > > > the usual plan envisioned by the PCI Firmware spec is that an > > > > > OS can use > > > > > a > > > > > PCIe feature if the platform has granted the OS ownership via > > > > > _OSC and > > > > > a device advertises the feature via a Capability in config > > > > > space. > > > > > > > > > > My main concern with the v2 patch > > > > > ( > > > > > https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com > > > > > ) > > > > > is that it overrides _OSC for native_pcie_hotplug, but not for > > > > > any of > > > > > the other features (AER, PME, LTR, DPC, etc.) > > > > > > > > > > I think it's hard to read the specs and conclude that PCIe > > > > > hotplug is > > > > > a special case, and I think we're likely to have similar issues > > > > > with > > > > > other features in the future. > > > > > > > > > > But if you think this is the best solution, I'm OK with merging > > > > > it. > > > Hi Bjorn, > > > > > > We tried some other ways to pass the information about all of the > > > flags but I couldn't retrieve it in guest OS and VMD hardware > > > doesn't have empty registers to write this information. So as of > > > now we have this solution which only overwrites Hotplug flag. If > > > you can accept it that would be great. > > > > My original commit "PCI: vmd: Honor ACPI _OSC on PCIe features" was a > > logically conclusion based on VMD ports are just apertures. > > Apparently there are more nuances than that, and people outside of > > Intel can't possibly know. And yes VMD creates lots of headaches > > during hardware enablement. > > > > So is it possible to document the right way to use _OSC, as Bjorn > > suggested? > > > > Kai-Heng > Well we are stuck here with two issues which are kind of chicken and egg > problem. > 1) VMD respects _OSC; which works excellent in Host but _OSC gives > wrong information in Guest OS which ends up disabling Hotplug, AER, > DPC, etc. We can live with AER, DPC disabled in VM but not the Hotplug. > > 2) VMD takes ownership of all the flags. For this to work VMD needs to > know these settings from BIOS. VMD hardware doesn't have empty register > where VMD UEFI driver can write this information. So honoring user > settings for AER, Hotplug, etc from BIOS is not possible. > > Where do you want me to add documentation? Will more information in > the comment section of the Sltcap code change help? > > Architecture wise VMD must own all of these flags but we have a > hardware limitation to successfully pass the necessary information to > the Host OS and the Guest OS. If there's an official document on intel.com, it can make many things clearer and easier. States what VMD does and what VMD expect OS to do can be really helpful. Basically put what you wrote in an official document. Kai-Heng > > Thanks > nirmal > > > > > > In the host OS, this overrides whatever was negotiated via _OSC, I > > > > guess on the principle that _OSC doesn't apply because the host > > > > BIOS doesn't know about the Root Ports below the VMD. (I'm not > > > > sure it's 100% resolved that _OSC doesn't apply to them, so we > > > > should mention that assumption here.) > > > _OSC still controls every flag including Hotplug. I have observed > > > that slot capabilities register has hotplug enabled only when > > > platform has enabled the hotplug. So technically not overriding it > > > in the host. It overrides in guest because _OSC is passing > > > wrong/different information than the _OSC information in Host. > > > Also like I mentioned, slot capabilities registers are not changed > > > in Guest because vmd is passthrough. > > > > > > > > In the guest OS, this still overrides whatever was negotiated via > > > > _OSC, but presumably the guest BIOS *does* know about the Root > > > > Ports, so I don't think the same principle about overriding _OSC > > > > applies there. > > > > > > > > I think it's still a problem that we handle > > > > host_bridge->native_pcie_hotplug differently than all the other > > > > features negotiated via _OSC. We should at least explain this > > > > somehow. > > > Right now this is the only way to know in Guest OS if platform has > > > enabled Hotplug or not. We have many customers complaining about > > > Hotplug being broken in Guest. So just trying to honor _OSC but also > > > fixing its limitation in Guest. > > > > > > Thanks > > > nirmal. > > > > > > > I sincerely apologize for late responses. I just found out that my > > > > evolution mail client is automatically sending linux-pci emails to > > > > junk > > > > since January 2024. > > > > > > > > At the moment Hotplug is an exception for us, but I do share your > > > > concern about other flags. We have done lot of testing and so far > > > > patch > > > > v2 is the best solution we have. > > > > > > > > Thanks > > > > nirmal > > > > > Bjorn > > > >
On Fri, 15 Mar 2024 09:29:32 +0800 Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: > On Fri, Mar 8, 2024 at 8:09 AM Nirmal Patel > <nirmal.patel@linux.intel.com> wrote: > > > > On Thu, 7 Mar 2024 14:44:23 +0800 > > Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: > > > > > Hi Nirmal, > > > > > > On Thu, Mar 7, 2024 at 6:20 AM Nirmal Patel > > > <nirmal.patel@linux.intel.com> wrote: > > > > > > > > On Tue, 2024-02-13 at 10:47 -0700, Nirmal Patel wrote: > > > > > On Wed, 2024-02-07 at 12:55 -0600, Bjorn Helgaas wrote: > > > > > > On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel > > > > > > wrote: > > > > > > > ... > > > > > > > Did you have a chance to look at my response on January > > > > > > > 16th to your > > > > > > > questions? I tried to summarize all of the potential > > > > > > > problems and issues with different fixes. Please let me > > > > > > > know if it is easier if > > > > > > > I > > > > > > > resend the explanation. Thanks. > > > > > > > > > > > > I did see your Jan 16 response, thanks. > > > > > > > > > > > > I had more questions after reading it, but they're more > > > > > > about understanding the topology seen by the host and the > > > > > > guest: Jan 16: > > > > > > https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas > > > > > > Feb 1: > > > > > > https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas > > > > > > > > > > > > As I mentioned in my second Feb 1 response > > > > > > (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas), > > > > > > the usual plan envisioned by the PCI Firmware spec is that > > > > > > an OS can use > > > > > > a > > > > > > PCIe feature if the platform has granted the OS ownership > > > > > > via _OSC and > > > > > > a device advertises the feature via a Capability in config > > > > > > space. > > > > > > > > > > > > My main concern with the v2 patch > > > > > > ( > > > > > > https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com > > > > > > ) > > > > > > is that it overrides _OSC for native_pcie_hotplug, but not > > > > > > for any of > > > > > > the other features (AER, PME, LTR, DPC, etc.) > > > > > > > > > > > > I think it's hard to read the specs and conclude that PCIe > > > > > > hotplug is > > > > > > a special case, and I think we're likely to have similar > > > > > > issues with > > > > > > other features in the future. > > > > > > > > > > > > But if you think this is the best solution, I'm OK with > > > > > > merging it. > > > > Hi Bjorn, > > > > > > > > We tried some other ways to pass the information about all of > > > > the flags but I couldn't retrieve it in guest OS and VMD > > > > hardware doesn't have empty registers to write this > > > > information. So as of now we have this solution which only > > > > overwrites Hotplug flag. If you can accept it that would be > > > > great. > > > > > > My original commit "PCI: vmd: Honor ACPI _OSC on PCIe features" > > > was a logically conclusion based on VMD ports are just apertures. > > > Apparently there are more nuances than that, and people outside of > > > Intel can't possibly know. And yes VMD creates lots of headaches > > > during hardware enablement. > > > > > > So is it possible to document the right way to use _OSC, as Bjorn > > > suggested? > > > > > > Kai-Heng > > Well we are stuck here with two issues which are kind of chicken > > and egg problem. > > 1) VMD respects _OSC; which works excellent in Host but _OSC gives > > wrong information in Guest OS which ends up disabling Hotplug, AER, > > DPC, etc. We can live with AER, DPC disabled in VM but not the > > Hotplug. > > > > 2) VMD takes ownership of all the flags. For this to work VMD needs > > to know these settings from BIOS. VMD hardware doesn't have empty > > register where VMD UEFI driver can write this information. So > > honoring user settings for AER, Hotplug, etc from BIOS is not > > possible. > > > > Where do you want me to add documentation? Will more information in > > the comment section of the Sltcap code change help? > > > > Architecture wise VMD must own all of these flags but we have a > > hardware limitation to successfully pass the necessary information > > to the Host OS and the Guest OS. > > If there's an official document on intel.com, it can make many things > clearer and easier. > States what VMD does and what VMD expect OS to do can be really > helpful. Basically put what you wrote in an official document. > > Kai-Heng Hi Kai-Heng/ Bjorn, Thanks for the suggestion. I can certainly find official VMD architecture document and add that required information to Documentation/PCI/controller/vmd.rst. Will that be okay? I also need your some help/suggestion on following alternate solution. We have been looking at VMD HW registers to find some empty registers. Cache Line Size register offset OCh is not being used by VMD. This is the explanation in PCI spec 5.0 section 7.5.1.1.7: "This read-write register is implemented for legacy compatibility purposes but has no effect on any PCI Express device behavior." Can these registers be used for passing _OSC settings from BIOS to VMD OS driver? These 8 bits are more than enough for UEFI VMD driver to store all _OSC flags and VMD OS driver can read it during OS boot up. This will solve all of our issues. Thanks nirmal > > > > > Thanks > > nirmal > > > > > > > > In the host OS, this overrides whatever was negotiated via > > > > > _OSC, I guess on the principle that _OSC doesn't apply > > > > > because the host BIOS doesn't know about the Root Ports below > > > > > the VMD. (I'm not sure it's 100% resolved that _OSC doesn't > > > > > apply to them, so we should mention that assumption here.) > > > > _OSC still controls every flag including Hotplug. I have > > > > observed that slot capabilities register has hotplug enabled > > > > only when platform has enabled the hotplug. So technically not > > > > overriding it in the host. It overrides in guest because _OSC > > > > is passing wrong/different information than the _OSC > > > > information in Host. Also like I mentioned, slot capabilities > > > > registers are not changed in Guest because vmd is passthrough. > > > > > > > > > > In the guest OS, this still overrides whatever was negotiated > > > > > via _OSC, but presumably the guest BIOS *does* know about the > > > > > Root Ports, so I don't think the same principle about > > > > > overriding _OSC applies there. > > > > > > > > > > I think it's still a problem that we handle > > > > > host_bridge->native_pcie_hotplug differently than all the > > > > > other features negotiated via _OSC. We should at least > > > > > explain this somehow. > > > > Right now this is the only way to know in Guest OS if platform > > > > has enabled Hotplug or not. We have many customers complaining > > > > about Hotplug being broken in Guest. So just trying to honor > > > > _OSC but also fixing its limitation in Guest. > > > > > > > > Thanks > > > > nirmal. > > > > > > > > > I sincerely apologize for late responses. I just found out > > > > > that my evolution mail client is automatically sending > > > > > linux-pci emails to junk > > > > > since January 2024. > > > > > > > > > > At the moment Hotplug is an exception for us, but I do share > > > > > your concern about other flags. We have done lot of testing > > > > > and so far patch > > > > > v2 is the best solution we have. > > > > > > > > > > Thanks > > > > > nirmal > > > > > > Bjorn > > > > > >
On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel wrote: > On Fri, 15 Mar 2024 09:29:32 +0800 > Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: > ... > > If there's an official document on intel.com, it can make many things > > clearer and easier. > > States what VMD does and what VMD expect OS to do can be really > > helpful. Basically put what you wrote in an official document. > > Thanks for the suggestion. I can certainly find official VMD > architecture document and add that required information to > Documentation/PCI/controller/vmd.rst. Will that be okay? I'd definitely be interested in whatever you can add to illuminate these issues. > I also need your some help/suggestion on following alternate solution. > We have been looking at VMD HW registers to find some empty registers. > Cache Line Size register offset OCh is not being used by VMD. This is > the explanation in PCI spec 5.0 section 7.5.1.1.7: > "This read-write register is implemented for legacy compatibility > purposes but has no effect on any PCI Express device behavior." > Can these registers be used for passing _OSC settings from BIOS to VMD > OS driver? > > These 8 bits are more than enough for UEFI VMD driver to store all _OSC > flags and VMD OS driver can read it during OS boot up. This will solve > all of our issues. Interesting idea. I think you'd have to do some work to separate out the conventional PCI devices, where PCI_CACHE_LINE_SIZE is still relevant, to make sure nothing breaks. But I think we overwrite it in some cases even for PCIe devices where it's pointless, and it would be nice to clean that up. Bjorn
On 3/22/2024 2:37 PM, Bjorn Helgaas wrote: > On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel wrote: >> On Fri, 15 Mar 2024 09:29:32 +0800 >> Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: >> ... > >>> If there's an official document on intel.com, it can make many things >>> clearer and easier. >>> States what VMD does and what VMD expect OS to do can be really >>> helpful. Basically put what you wrote in an official document. >> >> Thanks for the suggestion. I can certainly find official VMD >> architecture document and add that required information to >> Documentation/PCI/controller/vmd.rst. Will that be okay? > > I'd definitely be interested in whatever you can add to illuminate > these issues. > >> I also need your some help/suggestion on following alternate solution. >> We have been looking at VMD HW registers to find some empty registers. >> Cache Line Size register offset OCh is not being used by VMD. This is >> the explanation in PCI spec 5.0 section 7.5.1.1.7: >> "This read-write register is implemented for legacy compatibility >> purposes but has no effect on any PCI Express device behavior." >> Can these registers be used for passing _OSC settings from BIOS to VMD >> OS driver? >> >> These 8 bits are more than enough for UEFI VMD driver to store all _OSC >> flags and VMD OS driver can read it during OS boot up. This will solve >> all of our issues. > > Interesting idea. I think you'd have to do some work to separate out > the conventional PCI devices, where PCI_CACHE_LINE_SIZE is still > relevant, to make sure nothing breaks. But I think we overwrite it in > some cases even for PCIe devices where it's pointless, and it would be > nice to clean that up. > I think the suggestion here is to use the VMD devices Cache Line Size register, not the other PCI devices. In that case we don't have to worry about conventional PCI devices because we aren't touching them. Paul
On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr wrote: > On 3/22/2024 2:37 PM, Bjorn Helgaas wrote: > > On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel wrote: > > > On Fri, 15 Mar 2024 09:29:32 +0800 > > > Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: > > > ... > > > > > > If there's an official document on intel.com, it can make many things > > > > clearer and easier. > > > > States what VMD does and what VMD expect OS to do can be really > > > > helpful. Basically put what you wrote in an official document. > > > > > > Thanks for the suggestion. I can certainly find official VMD > > > architecture document and add that required information to > > > Documentation/PCI/controller/vmd.rst. Will that be okay? > > > > I'd definitely be interested in whatever you can add to illuminate > > these issues. > > > > > I also need your some help/suggestion on following alternate solution. > > > We have been looking at VMD HW registers to find some empty registers. > > > Cache Line Size register offset OCh is not being used by VMD. This is > > > the explanation in PCI spec 5.0 section 7.5.1.1.7: > > > "This read-write register is implemented for legacy compatibility > > > purposes but has no effect on any PCI Express device behavior." > > > Can these registers be used for passing _OSC settings from BIOS to VMD > > > OS driver? > > > > > > These 8 bits are more than enough for UEFI VMD driver to store all _OSC > > > flags and VMD OS driver can read it during OS boot up. This will solve > > > all of our issues. > > > > Interesting idea. I think you'd have to do some work to separate out > > the conventional PCI devices, where PCI_CACHE_LINE_SIZE is still > > relevant, to make sure nothing breaks. But I think we overwrite it in > > some cases even for PCIe devices where it's pointless, and it would be > > nice to clean that up. > > I think the suggestion here is to use the VMD devices Cache Line Size > register, not the other PCI devices. In that case we don't have to worry > about conventional PCI devices because we aren't touching them. Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE for every device in some cases. If we wrote the VMD PCI_CACHE_LINE_SIZE, it would obliterate whatever you want to pass there. Bjorn
On 3/22/2024 4:36 PM, Bjorn Helgaas wrote: > On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr wrote: >> On 3/22/2024 2:37 PM, Bjorn Helgaas wrote: >>> On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel wrote: >>>> On Fri, 15 Mar 2024 09:29:32 +0800 >>>> Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: >>>> ... >>> >>>>> If there's an official document on intel.com, it can make many things >>>>> clearer and easier. >>>>> States what VMD does and what VMD expect OS to do can be really >>>>> helpful. Basically put what you wrote in an official document. >>>> >>>> Thanks for the suggestion. I can certainly find official VMD >>>> architecture document and add that required information to >>>> Documentation/PCI/controller/vmd.rst. Will that be okay? >>> >>> I'd definitely be interested in whatever you can add to illuminate >>> these issues. >>> >>>> I also need your some help/suggestion on following alternate solution. >>>> We have been looking at VMD HW registers to find some empty registers. >>>> Cache Line Size register offset OCh is not being used by VMD. This is >>>> the explanation in PCI spec 5.0 section 7.5.1.1.7: >>>> "This read-write register is implemented for legacy compatibility >>>> purposes but has no effect on any PCI Express device behavior." >>>> Can these registers be used for passing _OSC settings from BIOS to VMD >>>> OS driver? >>>> >>>> These 8 bits are more than enough for UEFI VMD driver to store all _OSC >>>> flags and VMD OS driver can read it during OS boot up. This will solve >>>> all of our issues. >>> >>> Interesting idea. I think you'd have to do some work to separate out >>> the conventional PCI devices, where PCI_CACHE_LINE_SIZE is still >>> relevant, to make sure nothing breaks. But I think we overwrite it in >>> some cases even for PCIe devices where it's pointless, and it would be >>> nice to clean that up. >> >> I think the suggestion here is to use the VMD devices Cache Line Size >> register, not the other PCI devices. In that case we don't have to worry >> about conventional PCI devices because we aren't touching them. > > Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE for > every device in some cases. If we wrote the VMD PCI_CACHE_LINE_SIZE, > it would obliterate whatever you want to pass there. > Ah, got it. Thanks for clarifying. Paul
On Fri, 22 Mar 2024 18:36:37 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr wrote: > > On 3/22/2024 2:37 PM, Bjorn Helgaas wrote: > > > On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel wrote: > > > > On Fri, 15 Mar 2024 09:29:32 +0800 > > > > Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: > > > > ... > > > > > > > > If there's an official document on intel.com, it can make > > > > > many things clearer and easier. > > > > > States what VMD does and what VMD expect OS to do can be > > > > > really helpful. Basically put what you wrote in an official > > > > > document. > > > > > > > > Thanks for the suggestion. I can certainly find official VMD > > > > architecture document and add that required information to > > > > Documentation/PCI/controller/vmd.rst. Will that be okay? > > > > > > I'd definitely be interested in whatever you can add to illuminate > > > these issues. > > > > > > > I also need your some help/suggestion on following alternate > > > > solution. We have been looking at VMD HW registers to find some > > > > empty registers. Cache Line Size register offset OCh is not > > > > being used by VMD. This is the explanation in PCI spec 5.0 > > > > section 7.5.1.1.7: "This read-write register is implemented for > > > > legacy compatibility purposes but has no effect on any PCI > > > > Express device behavior." Can these registers be used for > > > > passing _OSC settings from BIOS to VMD OS driver? > > > > > > > > These 8 bits are more than enough for UEFI VMD driver to store > > > > all _OSC flags and VMD OS driver can read it during OS boot up. > > > > This will solve all of our issues. > > > > > > Interesting idea. I think you'd have to do some work to separate > > > out the conventional PCI devices, where PCI_CACHE_LINE_SIZE is > > > still relevant, to make sure nothing breaks. But I think we > > > overwrite it in some cases even for PCIe devices where it's > > > pointless, and it would be nice to clean that up. > > > > I think the suggestion here is to use the VMD devices Cache Line > > Size register, not the other PCI devices. In that case we don't > > have to worry about conventional PCI devices because we aren't > > touching them. > > Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE for > every device in some cases. If we wrote the VMD PCI_CACHE_LINE_SIZE, > it would obliterate whatever you want to pass there. > > Bjorn Our initial testing showed no change in value by overwrite from pci. The registers didn't change in Host as well as in Guest OS when some data was written from BIOS. I will perform more testing and also make sure to write every register just in case. Thanks for the response. -nirmal
On Tue, Mar 26, 2024 at 8:17 AM Nirmal Patel <nirmal.patel@linux.intel.com> wrote: > > On Fri, 22 Mar 2024 18:36:37 -0500 > Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr wrote: > > > On 3/22/2024 2:37 PM, Bjorn Helgaas wrote: > > > > On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel wrote: > > > > > On Fri, 15 Mar 2024 09:29:32 +0800 > > > > > Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: > > > > > ... > > > > > > > > > > If there's an official document on intel.com, it can make > > > > > > many things clearer and easier. > > > > > > States what VMD does and what VMD expect OS to do can be > > > > > > really helpful. Basically put what you wrote in an official > > > > > > document. > > > > > > > > > > Thanks for the suggestion. I can certainly find official VMD > > > > > architecture document and add that required information to > > > > > Documentation/PCI/controller/vmd.rst. Will that be okay? > > > > > > > > I'd definitely be interested in whatever you can add to illuminate > > > > these issues. > > > > > > > > > I also need your some help/suggestion on following alternate > > > > > solution. We have been looking at VMD HW registers to find some > > > > > empty registers. Cache Line Size register offset OCh is not > > > > > being used by VMD. This is the explanation in PCI spec 5.0 > > > > > section 7.5.1.1.7: "This read-write register is implemented for > > > > > legacy compatibility purposes but has no effect on any PCI > > > > > Express device behavior." Can these registers be used for > > > > > passing _OSC settings from BIOS to VMD OS driver? > > > > > > > > > > These 8 bits are more than enough for UEFI VMD driver to store > > > > > all _OSC flags and VMD OS driver can read it during OS boot up. > > > > > This will solve all of our issues. > > > > > > > > Interesting idea. I think you'd have to do some work to separate > > > > out the conventional PCI devices, where PCI_CACHE_LINE_SIZE is > > > > still relevant, to make sure nothing breaks. But I think we > > > > overwrite it in some cases even for PCIe devices where it's > > > > pointless, and it would be nice to clean that up. > > > > > > I think the suggestion here is to use the VMD devices Cache Line > > > Size register, not the other PCI devices. In that case we don't > > > have to worry about conventional PCI devices because we aren't > > > touching them. > > > > Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE for > > every device in some cases. If we wrote the VMD PCI_CACHE_LINE_SIZE, > > it would obliterate whatever you want to pass there. > > > > Bjorn > Our initial testing showed no change in value by overwrite from pci. The > registers didn't change in Host as well as in Guest OS when some data > was written from BIOS. I will perform more testing and also make sure > to write every register just in case. If the VMD hardware is designed in this way and there's an official document states that "VMD ports should follow _OSC expect for hotplugging" then IMO there's no need to find alternative. Kai-Heng > Thanks for the response. > > -nirmal >
On 3/25/2024 6:59 PM, Kai-Heng Feng wrote: > On Tue, Mar 26, 2024 at 8:17 AM Nirmal Patel > <nirmal.patel@linux.intel.com> wrote: >> >> On Fri, 22 Mar 2024 18:36:37 -0500 Bjorn Helgaas >> <helgaas@kernel.org> wrote: >> >>> On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr >>> wrote: >>>> On 3/22/2024 2:37 PM, Bjorn Helgaas wrote: >>>>> On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel >>>>> wrote: >>>>>> On Fri, 15 Mar 2024 09:29:32 +0800 Kai-Heng Feng >>>>>> <kai.heng.feng@canonical.com> wrote: ... >>>>> >>>>>>> If there's an official document on intel.com, it can >>>>>>> make many things clearer and easier. States what VMD does >>>>>>> and what VMD expect OS to do can be really helpful. >>>>>>> Basically put what you wrote in an official document. >>>>>> >>>>>> Thanks for the suggestion. I can certainly find official >>>>>> VMD architecture document and add that required information >>>>>> to Documentation/PCI/controller/vmd.rst. Will that be >>>>>> okay? >>>>> >>>>> I'd definitely be interested in whatever you can add to >>>>> illuminate these issues. >>>>> >>>>>> I also need your some help/suggestion on following >>>>>> alternate solution. We have been looking at VMD HW >>>>>> registers to find some empty registers. Cache Line Size >>>>>> register offset OCh is not being used by VMD. This is the >>>>>> explanation in PCI spec 5.0 section 7.5.1.1.7: "This >>>>>> read-write register is implemented for legacy compatibility >>>>>> purposes but has no effect on any PCI Express device >>>>>> behavior." Can these registers be used for passing _OSC >>>>>> settings from BIOS to VMD OS driver? >>>>>> >>>>>> These 8 bits are more than enough for UEFI VMD driver to >>>>>> store all _OSC flags and VMD OS driver can read it during >>>>>> OS boot up. This will solve all of our issues. >>>>> >>>>> Interesting idea. I think you'd have to do some work to >>>>> separate out the conventional PCI devices, where >>>>> PCI_CACHE_LINE_SIZE is still relevant, to make sure nothing >>>>> breaks. But I think we overwrite it in some cases even for >>>>> PCIe devices where it's pointless, and it would be nice to >>>>> clean that up. >>>> >>>> I think the suggestion here is to use the VMD devices Cache >>>> Line Size register, not the other PCI devices. In that case we >>>> don't have to worry about conventional PCI devices because we >>>> aren't touching them. >>> >>> Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE >>> for every device in some cases. If we wrote the VMD >>> PCI_CACHE_LINE_SIZE, it would obliterate whatever you want to >>> pass there. >>> >>> Bjorn >> Our initial testing showed no change in value by overwrite from >> pci. The registers didn't change in Host as well as in Guest OS >> when some data was written from BIOS. I will perform more testing >> and also make sure to write every register just in case. > > If the VMD hardware is designed in this way and there's an official > document states that "VMD ports should follow _OSC expect for > hotplugging" then IMO there's no need to find alternative. > There isn't any official documentation for this, it's just the way VMD is designed :). VMD assumes that everything behind it is hotpluggable so the bits don't *really* matter. In other words, VMD supports hotplug and you can't turn that off so hotplug is always on regardless of what the bits say. I believe prior to 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") VMD ignored the flags (which was ok, but led to the issue that the patch corrected). I think that patch is fine except for the hotplug bits because the hypervisor BIOS isn't filling them in correctly (if I understand the problem correctly). As I mentioned earlier the VMD design is such that VMD is going to handle all the hotplug events so the bits in the root bridge for hotplug are irrelevant WRT VMD. Paul > Kai-Heng > >> Thanks for the response. >> >> -nirmal >>
On 3/26/2024 8:51 AM, Paul M Stillwell Jr wrote: > On 3/25/2024 6:59 PM, Kai-Heng Feng wrote: >> On Tue, Mar 26, 2024 at 8:17 AM Nirmal Patel >> <nirmal.patel@linux.intel.com> wrote: >>> >>> On Fri, 22 Mar 2024 18:36:37 -0500 Bjorn Helgaas >>> <helgaas@kernel.org> wrote: >>> >>>> On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr >>>> wrote: >>>>> On 3/22/2024 2:37 PM, Bjorn Helgaas wrote: >>>>>> On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel >>>>>> wrote: >>>>>>> On Fri, 15 Mar 2024 09:29:32 +0800 Kai-Heng Feng >>>>>>> <kai.heng.feng@canonical.com> wrote: ... >>>>>> >>>>>>>> If there's an official document on intel.com, it can >>>>>>>> make many things clearer and easier. States what VMD does >>>>>>>> and what VMD expect OS to do can be really helpful. >>>>>>>> Basically put what you wrote in an official document. >>>>>>> >>>>>>> Thanks for the suggestion. I can certainly find official >>>>>>> VMD architecture document and add that required information >>>>>>> to Documentation/PCI/controller/vmd.rst. Will that be >>>>>>> okay? >>>>>> >>>>>> I'd definitely be interested in whatever you can add to >>>>>> illuminate these issues. >>>>>> >>>>>>> I also need your some help/suggestion on following >>>>>>> alternate solution. We have been looking at VMD HW >>>>>>> registers to find some empty registers. Cache Line Size >>>>>>> register offset OCh is not being used by VMD. This is the >>>>>>> explanation in PCI spec 5.0 section 7.5.1.1.7: "This >>>>>>> read-write register is implemented for legacy compatibility >>>>>>> purposes but has no effect on any PCI Express device >>>>>>> behavior." Can these registers be used for passing _OSC >>>>>>> settings from BIOS to VMD OS driver? >>>>>>> >>>>>>> These 8 bits are more than enough for UEFI VMD driver to >>>>>>> store all _OSC flags and VMD OS driver can read it during >>>>>>> OS boot up. This will solve all of our issues. >>>>>> >>>>>> Interesting idea. I think you'd have to do some work to >>>>>> separate out the conventional PCI devices, where >>>>>> PCI_CACHE_LINE_SIZE is still relevant, to make sure nothing >>>>>> breaks. But I think we overwrite it in some cases even for >>>>>> PCIe devices where it's pointless, and it would be nice to >>>>>> clean that up. >>>>> >>>>> I think the suggestion here is to use the VMD devices Cache >>>>> Line Size register, not the other PCI devices. In that case we >>>>> don't have to worry about conventional PCI devices because we >>>>> aren't touching them. >>>> >>>> Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE >>>> for every device in some cases. If we wrote the VMD >>>> PCI_CACHE_LINE_SIZE, it would obliterate whatever you want to >>>> pass there. >>>> >>>> Bjorn >>> Our initial testing showed no change in value by overwrite from >>> pci. The registers didn't change in Host as well as in Guest OS >>> when some data was written from BIOS. I will perform more testing >>> and also make sure to write every register just in case. >> >> If the VMD hardware is designed in this way and there's an official >> document states that "VMD ports should follow _OSC expect for >> hotplugging" then IMO there's no need to find alternative. >> > > There isn't any official documentation for this, it's just the way VMD > is designed :). VMD assumes that everything behind it is hotpluggable so > the bits don't *really* matter. In other words, VMD supports hotplug and > you can't turn that off so hotplug is always on regardless of what the > bits say. > > I believe prior to 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe > features") VMD ignored the flags (which was ok, but led to the issue > that the patch corrected). I think that patch is fine except for the > hotplug bits because the hypervisor BIOS isn't filling them in correctly > (if I understand the problem correctly). As I mentioned earlier the VMD > design is such that VMD is going to handle all the hotplug events so the > bits in the root bridge for hotplug are irrelevant WRT VMD. > I should have been clearer in one aspect of this response: the hypervisor BIOS sets all the _OSC bits to zero, not just the hotplug bits. It's just that the hotplug bits cause us issues when VMD is being used in a VM because it disables hotplug and we need it to be enabled because our customers expect to always be able to hotplug devices owned by VMD. > Paul > >> Kai-Heng >> >>> Thanks for the response. >>> >>> -nirmal >>> > >
On Tue, Mar 26, 2024 at 08:51:45AM -0700, Paul M Stillwell Jr wrote: > On 3/25/2024 6:59 PM, Kai-Heng Feng wrote: > > On Tue, Mar 26, 2024 at 8:17 AM Nirmal Patel > > <nirmal.patel@linux.intel.com> wrote: > > > > > > On Fri, 22 Mar 2024 18:36:37 -0500 Bjorn Helgaas > > > <helgaas@kernel.org> wrote: > > > > > > > On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr > > > > wrote: > > > > > On 3/22/2024 2:37 PM, Bjorn Helgaas wrote: > > > > > > On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel > > > > > > wrote: > > > > > > > On Fri, 15 Mar 2024 09:29:32 +0800 Kai-Heng Feng > > > > > > > <kai.heng.feng@canonical.com> wrote: ... > > > > > > > > > > > > > > If there's an official document on intel.com, it can > > > > > > > > make many things clearer and easier. States what VMD does > > > > > > > > and what VMD expect OS to do can be really helpful. > > > > > > > > Basically put what you wrote in an official document. > > > > > > > > > > > > > > Thanks for the suggestion. I can certainly find official > > > > > > > VMD architecture document and add that required information > > > > > > > to Documentation/PCI/controller/vmd.rst. Will that be > > > > > > > okay? > > > > > > > > > > > > I'd definitely be interested in whatever you can add to > > > > > > illuminate these issues. > > > > > > > > > > > > > I also need your some help/suggestion on following > > > > > > > alternate solution. We have been looking at VMD HW > > > > > > > registers to find some empty registers. Cache Line Size > > > > > > > register offset OCh is not being used by VMD. This is the > > > > > > > explanation in PCI spec 5.0 section 7.5.1.1.7: "This > > > > > > > read-write register is implemented for legacy compatibility > > > > > > > purposes but has no effect on any PCI Express device > > > > > > > behavior." Can these registers be used for passing _OSC > > > > > > > settings from BIOS to VMD OS driver? > > > > > > > > > > > > > > These 8 bits are more than enough for UEFI VMD driver to > > > > > > > store all _OSC flags and VMD OS driver can read it during > > > > > > > OS boot up. This will solve all of our issues. > > > > > > > > > > > > Interesting idea. I think you'd have to do some work to > > > > > > separate out the conventional PCI devices, where > > > > > > PCI_CACHE_LINE_SIZE is still relevant, to make sure nothing > > > > > > breaks. But I think we overwrite it in some cases even for > > > > > > PCIe devices where it's pointless, and it would be nice to > > > > > > clean that up. > > > > > > > > > > I think the suggestion here is to use the VMD devices Cache > > > > > Line Size register, not the other PCI devices. In that case we > > > > > don't have to worry about conventional PCI devices because we > > > > > aren't touching them. > > > > > > > > Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE > > > > for every device in some cases. If we wrote the VMD > > > > PCI_CACHE_LINE_SIZE, it would obliterate whatever you want to > > > > pass there. > > > > > > > Our initial testing showed no change in value by overwrite from > > > pci. The registers didn't change in Host as well as in Guest OS > > > when some data was written from BIOS. I will perform more testing > > > and also make sure to write every register just in case. > > > > If the VMD hardware is designed in this way and there's an official > > document states that "VMD ports should follow _OSC expect for > > hotplugging" then IMO there's no need to find alternative. > > There isn't any official documentation for this, it's just the way VMD > is designed :). VMD assumes that everything behind it is hotpluggable so > the bits don't *really* matter. In other words, VMD supports hotplug and > you can't turn that off so hotplug is always on regardless of what the > bits say. This whole discussion is about who *owns* the feature, not whether the hardware supports the feature. The general rule is "if _OSC covers the feature, the platform owns the feature and the OS shouldn't touch it unless the OS requests and is granted control of it." VMD wants special treatment, and we'd like a simple description of what that treatment is. Right now it sounds like you want the OS to own *hotplug* below VMD regardless of what _OSC says. But I still have no idea about other features like AER, etc., so we're kind of in no man's land about how to handle them. Bjorn
On 3/26/2024 2:08 PM, Bjorn Helgaas wrote: > On Tue, Mar 26, 2024 at 08:51:45AM -0700, Paul M Stillwell Jr wrote: >> On 3/25/2024 6:59 PM, Kai-Heng Feng wrote: >>> On Tue, Mar 26, 2024 at 8:17 AM Nirmal Patel >>> <nirmal.patel@linux.intel.com> wrote: >>>> >>>> On Fri, 22 Mar 2024 18:36:37 -0500 Bjorn Helgaas >>>> <helgaas@kernel.org> wrote: >>>> >>>>> On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr >>>>> wrote: >>>>>> On 3/22/2024 2:37 PM, Bjorn Helgaas wrote: >>>>>>> On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel >>>>>>> wrote: >>>>>>>> On Fri, 15 Mar 2024 09:29:32 +0800 Kai-Heng Feng >>>>>>>> <kai.heng.feng@canonical.com> wrote: ... >>>>>>> >>>>>>>>> If there's an official document on intel.com, it can >>>>>>>>> make many things clearer and easier. States what VMD does >>>>>>>>> and what VMD expect OS to do can be really helpful. >>>>>>>>> Basically put what you wrote in an official document. >>>>>>>> >>>>>>>> Thanks for the suggestion. I can certainly find official >>>>>>>> VMD architecture document and add that required information >>>>>>>> to Documentation/PCI/controller/vmd.rst. Will that be >>>>>>>> okay? >>>>>>> >>>>>>> I'd definitely be interested in whatever you can add to >>>>>>> illuminate these issues. >>>>>>> >>>>>>>> I also need your some help/suggestion on following >>>>>>>> alternate solution. We have been looking at VMD HW >>>>>>>> registers to find some empty registers. Cache Line Size >>>>>>>> register offset OCh is not being used by VMD. This is the >>>>>>>> explanation in PCI spec 5.0 section 7.5.1.1.7: "This >>>>>>>> read-write register is implemented for legacy compatibility >>>>>>>> purposes but has no effect on any PCI Express device >>>>>>>> behavior." Can these registers be used for passing _OSC >>>>>>>> settings from BIOS to VMD OS driver? >>>>>>>> >>>>>>>> These 8 bits are more than enough for UEFI VMD driver to >>>>>>>> store all _OSC flags and VMD OS driver can read it during >>>>>>>> OS boot up. This will solve all of our issues. >>>>>>> >>>>>>> Interesting idea. I think you'd have to do some work to >>>>>>> separate out the conventional PCI devices, where >>>>>>> PCI_CACHE_LINE_SIZE is still relevant, to make sure nothing >>>>>>> breaks. But I think we overwrite it in some cases even for >>>>>>> PCIe devices where it's pointless, and it would be nice to >>>>>>> clean that up. >>>>>> >>>>>> I think the suggestion here is to use the VMD devices Cache >>>>>> Line Size register, not the other PCI devices. In that case we >>>>>> don't have to worry about conventional PCI devices because we >>>>>> aren't touching them. >>>>> >>>>> Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE >>>>> for every device in some cases. If we wrote the VMD >>>>> PCI_CACHE_LINE_SIZE, it would obliterate whatever you want to >>>>> pass there. >>>>> >>>> Our initial testing showed no change in value by overwrite from >>>> pci. The registers didn't change in Host as well as in Guest OS >>>> when some data was written from BIOS. I will perform more testing >>>> and also make sure to write every register just in case. >>> >>> If the VMD hardware is designed in this way and there's an official >>> document states that "VMD ports should follow _OSC expect for >>> hotplugging" then IMO there's no need to find alternative. >> >> There isn't any official documentation for this, it's just the way VMD >> is designed :). VMD assumes that everything behind it is hotpluggable so >> the bits don't *really* matter. In other words, VMD supports hotplug and >> you can't turn that off so hotplug is always on regardless of what the >> bits say. > > This whole discussion is about who *owns* the feature, not whether the > hardware supports the feature. > I'm not disagreeing about who owns the feature :) I'm trying to find a solution when the data that the driver gets is wrong. > The general rule is "if _OSC covers the feature, the platform owns the > feature and the OS shouldn't touch it unless the OS requests and is > granted control of it." > The code is fine in a bare metal system, but completely broken in a hypervisor because all the _OSC bits are set to 0. So I am trying to find a solution where the hotplug bits can be set correctly in this situation. > VMD wants special treatment, and we'd like a simple description of > what that treatment is. Right now it sounds like you want the OS to > own *hotplug* below VMD regardless of what _OSC says. > What I want is for hotplug to be enabled in a hypervisor :) The slot capability register has the correct bits set, but it doesn't make sense (to me) to use a register in a root port to set the bridge settings. That's why this patch just removed the hotplug bits from the code. Everything is set up correctly (as far as I can tell) in both bare metal and hypervisors if we omit those 2 lines. Does anyone have a better suggestion on how to handle the case where the _OSC bits are incorrect? Paul > But I still have no idea about other features like AER, etc., so we're > kind of in no man's land about how to handle them. > > Bjorn
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 769eedeb8802..e39eaef5549a 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -720,6 +720,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) resource_size_t membar2_offset = 0x2000; struct pci_bus *child; struct pci_dev *dev; + struct pci_host_bridge *vmd_bridge; int ret; /* @@ -886,8 +887,16 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) * and will fail pcie_bus_configure_settings() early. It can instead be * run on each of the real root ports. */ - list_for_each_entry(child, &vmd->bus->children, node) + vmd_bridge = to_pci_host_bridge(vmd->bus->bridge); + list_for_each_entry(child, &vmd->bus->children, node) { pcie_bus_configure_settings(child); + /* + * When Hotplug is enabled on vmd root-port, enable it on vmd + * bridge. + */ + if (child->self->is_hotplug_bridge) + vmd_bridge->native_pcie_hotplug = 1; + } pci_bus_add_devices(vmd->bus);
Currently VMD copies root bridge setting to enable Hotplug on its rootports. This mechanism works fine for Host OS and no issue has been observed. However in case of VM, all the HyperVisors don't pass the Hotplug setting to the guest BIOS which results in assigning default values and disabling Hotplug capability in the guest which have been observed by many OEMs. VMD Hotplug can be enabled or disabled based on the VMD rootports' Hotplug configuration in BIOS. is_hotplug_bridge is set on each VMD rootport based on Hotplug capable bit in SltCap in probe.c. Check is_hotplug_bridge and enable or disable native_pcie_hotplug based on that value. This patch will make sure that Hotplug is enabled properly in Host as well as in VM while honoring _OSC settings as well as VMD hotplug setting. Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com> --- v1->v2: Updating commit message. --- drivers/pci/controller/vmd.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)