diff mbox series

PCI: vmd: VMD to control Hotplug on its rootports

Message ID 20230705172038.844706-1-nirmal.patel@linux.intel.com (mailing list archive)
State Changes Requested
Delegated to: Bjorn Helgaas
Headers show
Series PCI: vmd: VMD to control Hotplug on its rootports | expand

Commit Message

Nirmal Patel July 5, 2023, 5:20 p.m. UTC
The hotplug functionality is broken in various combinations of guest
OSes i.e. RHEL, SlES and hypervisors i.e. KVM and ESXI.

VMD enabled on Intel ADL cpus caused interrupt storm for smasung
drives due to AER being enabled on VMD controlled root ports.
The patch 04b12ef163d10e348db664900ae7f611b83c7a0e
("PCI: vmd: Honor APCI _OSC on PCIe features.") was added to the VMD
driver to correct the issue based on the following assumption:
	“Since VMD is an aperture to regular PCIe root ports, honor ACPI
	_OSC to disable PCIe features accordingly to resolve the issue.”
	Link: https://lore.kernel.org/r/20211203031541.1428904-1-kai.heng.feng@canonical.com

VMD as a PCIe device is an end point(type 0), not a PCIe aperture
(pcie bridge). In fact VMD is a type 0 raid controller(class code).
When BIOS boots, all root ports under VMD is inaccessible by BIOS, and
as such, they maintain their power on default states. The VMD UEFI DXE
driver loads and configure all devices under VMD. This is how AER,
power management and hotplug gets enabled in UEFI, since the BIOS pci
driver cannot access the root ports.

The patch worked around the interrupt storm by assigning the native
ACPI states to the  root ports under VMD. It assigns AER, hotplug,
PME, etc. These have been restored back to the power on default state
in guest OS, which says the root port hot plug enable is default OFF.
At most, the work around should have only assigned AER state.
An additional patch should be added to exclude hot plug from the
original patch.
This will cause hot plug to start working again in the guest, as the
settings implemented by the UEFI VMD DXE driver will remain in effect
in Linux.

Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
---
 drivers/pci/controller/vmd.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Bjorn Helgaas July 13, 2023, 4:58 p.m. UTC | #1
[+cc Jonathan, Lorenzo, Krzysztof, Rob (from MAINTAINERS)]

Can you make the subject line say what the patch does?  Repeating
"VMD" is probably unnecessary.

On Wed, Jul 05, 2023 at 01:20:38PM -0400, Nirmal Patel wrote:
> The hotplug functionality is broken in various combinations of guest
> OSes i.e. RHEL, SlES and hypervisors i.e. KVM and ESXI.

"SLES"

> VMD enabled on Intel ADL cpus caused interrupt storm for smasung
> drives due to AER being enabled on VMD controlled root ports.

"Samsung"

Enabling AER should not cause an interrupt storm.  There must be
something else going on in addition.  Are you saying the Samsung
drives have some AER-related defect, like generating Correctable
Errors when they shouldn't?  It doesn't sound like "Intel ADL" would
be relevant here.

It's not clear to me if this is directly related to *hotplug* or if
it's an AER issue that may happen because of hotplug and possible for
other reasons.

> The patch 04b12ef163d10e348db664900ae7f611b83c7a0e

12 char SHA1 is sufficient.

> ("PCI: vmd: Honor APCI _OSC on PCIe features.") was added to the VMD
> driver to correct the issue based on the following assumption:
> 	“Since VMD is an aperture to regular PCIe root ports, honor ACPI
> 	_OSC to disable PCIe features accordingly to resolve the issue.”
> 	Link: https://lore.kernel.org/r/20211203031541.1428904-1-kai.heng.feng@canonical.com
> 
> VMD as a PCIe device is an end point(type 0), not a PCIe aperture
> (pcie bridge). In fact VMD is a type 0 raid controller(class code).
> When BIOS boots, all root ports under VMD is inaccessible by BIOS, and
> as such, they maintain their power on default states. The VMD UEFI DXE
> driver loads and configure all devices under VMD. This is how AER,
> power management and hotplug gets enabled in UEFI, since the BIOS pci
> driver cannot access the root ports.

s/pcie/PCIe/
s/pci/PCI/
s/raid/RAID/

> The patch worked around the interrupt storm by assigning the native

I assume "the patch" means 04b12ef163d1.  Sometimes people write "the
patch does X" in the commit log for the current patch, so it's nice to
be specific to avoid confusion.

> ACPI states to the  root ports under VMD. It assigns AER, hotplug,
> PME, etc. These have been restored back to the power on default state
> in guest OS, which says the root port hot plug enable is default OFF.
> At most, the work around should have only assigned AER state.
> An additional patch should be added to exclude hot plug from the
> original patch.

Add blank line between paragraphs.

> This will cause hot plug to start working again in the guest, as the
> settings implemented by the UEFI VMD DXE driver will remain in effect
> in Linux.

This is a lot of description that doesn't seem to say what the actual
underlying issue is.  It's basically "if we treat hotplug differently
from other negotiated features, things work better."  And it seems
like it depends on what the UEFI driver did, and we should try to
avoid a dependency there.

If the issue is too many Correctable Errors being reported via AER, we
have that problem regardless of VMD, and we should handle it by
rate-limiting and/or suppressing them completely for particularly
offensive devices.  We have some issues like this that are still
outstanding:

[1] https://lore.kernel.org/linux-pci/DM6PR04MB6473197DBD89FF4643CC391F8BC19@DM6PR04MB6473.namprd04.prod.outlook.com/
[2] https://lore.kernel.org/linux-pci/20230606035256.2886098-2-grundler@chromium.org/

> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> ---
>  drivers/pci/controller/vmd.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 769eedeb8802..52c2461b4761 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>  				       struct pci_host_bridge *vmd_bridge)
>  {
> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
>  	vmd_bridge->native_aer = root_bridge->native_aer;
>  	vmd_bridge->native_pme = root_bridge->native_pme;
>  	vmd_bridge->native_ltr = root_bridge->native_ltr;
> -- 
> 2.31.1
>
Jonathan Derrick July 13, 2023, 7:26 p.m. UTC | #2
On 7/13/2023 10:58 AM, Bjorn Helgaas wrote:
> [+cc Jonathan, Lorenzo, Krzysztof, Rob (from MAINTAINERS)]
> 
> Can you make the subject line say what the patch does?  Repeating
> "VMD" is probably unnecessary.
> 
> On Wed, Jul 05, 2023 at 01:20:38PM -0400, Nirmal Patel wrote:
>> The hotplug functionality is broken in various combinations of guest
>> OSes i.e. RHEL, SlES and hypervisors i.e. KVM and ESXI.
> 
> "SLES"
> 
>> VMD enabled on Intel ADL cpus caused interrupt storm for smasung
>> drives due to AER being enabled on VMD controlled root ports.
> 
> "Samsung"
> 
> Enabling AER should not cause an interrupt storm.  There must be
> something else going on in addition.  Are you saying the Samsung
> drives have some AER-related defect, like generating Correctable
> Errors when they shouldn't?  It doesn't sound like "Intel ADL" would
> be relevant here.
> 
> It's not clear to me if this is directly related to *hotplug* or if
> it's an AER issue that may happen because of hotplug and possible for
> other reasons.
> 
>> The patch 04b12ef163d10e348db664900ae7f611b83c7a0e
> 
> 12 char SHA1 is sufficient.
> 
>> ("PCI: vmd: Honor APCI _OSC on PCIe features.") was added to the VMD
>> driver to correct the issue based on the following assumption:
>> 	“Since VMD is an aperture to regular PCIe root ports, honor ACPI
>> 	_OSC to disable PCIe features accordingly to resolve the issue.”
>> 	Link: https://lore.kernel.org/r/20211203031541.1428904-1-kai.heng.feng@canonical.com
>>
>> VMD as a PCIe device is an end point(type 0), not a PCIe aperture
>> (pcie bridge). In fact VMD is a type 0 raid controller(class code).
>> When BIOS boots, all root ports under VMD is inaccessible by BIOS, and
>> as such, they maintain their power on default states. The VMD UEFI DXE
>> driver loads and configure all devices under VMD. This is how AER,
>> power management and hotplug gets enabled in UEFI, since the BIOS pci
>> driver cannot access the root ports.
> 
> s/pcie/PCIe/
> s/pci/PCI/
> s/raid/RAID/
> 
>> The patch worked around the interrupt storm by assigning the native
> 
> I assume "the patch" means 04b12ef163d1.  Sometimes people write "the
> patch does X" in the commit log for the current patch, so it's nice to
> be specific to avoid confusion.
> 
>> ACPI states to the  root ports under VMD. It assigns AER, hotplug,
>> PME, etc. These have been restored back to the power on default state
>> in guest OS, which says the root port hot plug enable is default OFF.
>> At most, the work around should have only assigned AER state.
>> An additional patch should be added to exclude hot plug from the
>> original patch.
> 
> Add blank line between paragraphs.
> 
>> This will cause hot plug to start working again in the guest, as the
>> settings implemented by the UEFI VMD DXE driver will remain in effect
>> in Linux.
So hotplug will work in the guest per UEFI VMD DXE driver, but AER will 
be determined by the host native_aer state?

> 
> This is a lot of description that doesn't seem to say what the actual
> underlying issue is.  It's basically "if we treat hotplug differently
> from other negotiated features, things work better."  And it seems
> like it depends on what the UEFI driver did, and we should try to
> avoid a dependency there.
> 
> If the issue is too many Correctable Errors being reported via AER, we
> have that problem regardless of VMD, and we should handle it by
> rate-limiting and/or suppressing them completely for particularly
> offensive devices.  We have some issues like this that are still
> outstanding:
I'm curious if Non-VMD root ports on the system see the same CE storm.
Either way, rate-limiting per device seems like a good idea.

> 
> [1] https://lore.kernel.org/linux-pci/DM6PR04MB6473197DBD89FF4643CC391F8BC19@DM6PR04MB6473.namprd04.prod.outlook.com/
> [2] https://lore.kernel.org/linux-pci/20230606035256.2886098-2-grundler@chromium.org/
> 
>> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
>> ---
>>   drivers/pci/controller/vmd.c | 2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
>> index 769eedeb8802..52c2461b4761 100644
>> --- a/drivers/pci/controller/vmd.c
>> +++ b/drivers/pci/controller/vmd.c
>> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>>   static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>>   				       struct pci_host_bridge *vmd_bridge)
>>   {
>> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
>> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
>>   	vmd_bridge->native_aer = root_bridge->native_aer;
>>   	vmd_bridge->native_pme = root_bridge->native_pme;
>>   	vmd_bridge->native_ltr = root_bridge->native_ltr;
>> -- 
>> 2.31.1
>>
Nirmal Patel July 19, 2023, 2 a.m. UTC | #3
On 7/13/2023 9:58 AM, Bjorn Helgaas wrote:
> [+cc Jonathan, Lorenzo, Krzysztof, Rob (from MAINTAINERS)]
>
> Can you make the subject line say what the patch does?  Repeating
> "VMD" is probably unnecessary.
>
> On Wed, Jul 05, 2023 at 01:20:38PM -0400, Nirmal Patel wrote:
>> The hotplug functionality is broken in various combinations of guest
>> OSes i.e. RHEL, SlES and hypervisors i.e. KVM and ESXI.
> "SLES"

Will fix it.

>
>> VMD enabled on Intel ADL cpus caused interrupt storm for smasung
>> drives due to AER being enabled on VMD controlled root ports.
> "Samsung"
Will fix it.
>
> Enabling AER should not cause an interrupt storm.  There must be
> something else going on in addition.  Are you saying the Samsung
> drives have some AER-related defect, like generating Correctable
> Errors when they shouldn't?  It doesn't sound like "Intel ADL" would
> be relevant here.
>
> It's not clear to me if this is directly related to *hotplug* or if
> it's an AER issue that may happen because of hotplug and possible for
> other reasons.
>
>> The patch 04b12ef163d10e348db664900ae7f611b83c7a0e
> 12 char SHA1 is sufficient.
Will fix it.
>
>> ("PCI: vmd: Honor APCI _OSC on PCIe features.") was added to the VMD
>> driver to correct the issue based on the following assumption:
>> 	“Since VMD is an aperture to regular PCIe root ports, honor ACPI
>> 	_OSC to disable PCIe features accordingly to resolve the issue.”
>> 	Link: https://lore.kernel.org/r/20211203031541.1428904-1-kai.heng.feng@canonical.com
>>
>> VMD as a PCIe device is an end point(type 0), not a PCIe aperture
>> (pcie bridge). In fact VMD is a type 0 raid controller(class code).
>> When BIOS boots, all root ports under VMD is inaccessible by BIOS, and
>> as such, they maintain their power on default states. The VMD UEFI DXE
>> driver loads and configure all devices under VMD. This is how AER,
>> power management and hotplug gets enabled in UEFI, since the BIOS pci
>> driver cannot access the root ports.
> s/pcie/PCIe/
> s/pci/PCI/
> s/raid/RAID/
Will fix it.
>
>> The patch worked around the interrupt storm by assigning the native
> I assume "the patch" means 04b12ef163d1.  Sometimes people write "the
> patch does X" in the commit log for the current patch, so it's nice to
> be specific to avoid confusion.
>
>> ACPI states to the  root ports under VMD. It assigns AER, hotplug,
>> PME, etc. These have been restored back to the power on default state
>> in guest OS, which says the root port hot plug enable is default OFF.
>> At most, the work around should have only assigned AER state.
>> An additional patch should be added to exclude hot plug from the
>> original patch.
> Add blank line between paragraphs.
Will fix it.
>
>> This will cause hot plug to start working again in the guest, as the
>> settings implemented by the UEFI VMD DXE driver will remain in effect
>> in Linux.
> This is a lot of description that doesn't seem to say what the actual
> underlying issue is.  It's basically "if we treat hotplug differently
> from other negotiated features, things work better."  And it seems
> like it depends on what the UEFI driver did, and we should try to
> avoid a dependency there.
>
> If the issue is too many Correctable Errors being reported via AER, we
> have that problem regardless of VMD, and we should handle it by
> rate-limiting and/or suppressing them completely for particularly
> offensive devices.  We have some issues like this that are still
> outstanding:
>
> [1] https://lore.kernel.org/linux-pci/DM6PR04MB6473197DBD89FF4643CC391F8BC19@DM6PR04MB6473.namprd04.prod.outlook.com/
> [2] https://lore.kernel.org/linux-pci/20230606035256.2886098-2-grundler@chromium.org/
Sorry for the long description. I think your understanding of treating
hotplug differently from other features is correct. The hotplug issue
is seen only in guest because guest BIOS doesn't know about such
settings nor have an implementation. Ideally, when hotplug is enabled
in Host BIOS, VMD root ports in guest should obtain same values with
the help of direct assign feature. The patch (04b12ef163d1) overwrites
these values and assigns default values(i.e 0h for hotplug) resulting
in hotplug disabled.

Regarding AER, I think we should leave other settings as it is. During
the initialization, AER is enabled on VMD root-port without knowing if
the platform firmware has a handler. Thanks to the patch(04b12ef163d1),
this mistake was corrected.

>> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
>> ---
>>  drivers/pci/controller/vmd.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
>> index 769eedeb8802..52c2461b4761 100644
>> --- a/drivers/pci/controller/vmd.c
>> +++ b/drivers/pci/controller/vmd.c
>> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>>  				       struct pci_host_bridge *vmd_bridge)
>>  {
>> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
>> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
>>  	vmd_bridge->native_aer = root_bridge->native_aer;
>>  	vmd_bridge->native_pme = root_bridge->native_pme;
>>  	vmd_bridge->native_ltr = root_bridge->native_ltr;
>> -- 
>> 2.31.1
>>
Nirmal Patel July 24, 2023, 9:59 p.m. UTC | #4
On 7/13/2023 12:26 PM, Jonathan Derrick wrote:
>
>
> On 7/13/2023 10:58 AM, Bjorn Helgaas wrote:
>> [+cc Jonathan, Lorenzo, Krzysztof, Rob (from MAINTAINERS)]
>>
>> Can you make the subject line say what the patch does?  Repeating
>> "VMD" is probably unnecessary.
>>
>> On Wed, Jul 05, 2023 at 01:20:38PM -0400, Nirmal Patel wrote:
>>> The hotplug functionality is broken in various combinations of guest
>>> OSes i.e. RHEL, SlES and hypervisors i.e. KVM and ESXI.
>>
>> "SLES"
>>
>>> VMD enabled on Intel ADL cpus caused interrupt storm for smasung
>>> drives due to AER being enabled on VMD controlled root ports.
>>
>> "Samsung"
>>
>> Enabling AER should not cause an interrupt storm.  There must be
>> something else going on in addition.  Are you saying the Samsung
>> drives have some AER-related defect, like generating Correctable
>> Errors when they shouldn't?  It doesn't sound like "Intel ADL" would
>> be relevant here.
>>
>> It's not clear to me if this is directly related to *hotplug* or if
>> it's an AER issue that may happen because of hotplug and possible for
>> other reasons.
>>
>>> The patch 04b12ef163d10e348db664900ae7f611b83c7a0e
>>
>> 12 char SHA1 is sufficient.
>>
>>> ("PCI: vmd: Honor APCI _OSC on PCIe features.") was added to the VMD
>>> driver to correct the issue based on the following assumption:
>>>     “Since VMD is an aperture to regular PCIe root ports, honor ACPI
>>>     _OSC to disable PCIe features accordingly to resolve the issue.”
>>>     Link: https://lore.kernel.org/r/20211203031541.1428904-1-kai.heng.feng@canonical.com
>>>
>>> VMD as a PCIe device is an end point(type 0), not a PCIe aperture
>>> (pcie bridge). In fact VMD is a type 0 raid controller(class code).
>>> When BIOS boots, all root ports under VMD is inaccessible by BIOS, and
>>> as such, they maintain their power on default states. The VMD UEFI DXE
>>> driver loads and configure all devices under VMD. This is how AER,
>>> power management and hotplug gets enabled in UEFI, since the BIOS pci
>>> driver cannot access the root ports.
>>
>> s/pcie/PCIe/
>> s/pci/PCI/
>> s/raid/RAID/
>>
>>> The patch worked around the interrupt storm by assigning the native
>>
>> I assume "the patch" means 04b12ef163d1.  Sometimes people write "the
>> patch does X" in the commit log for the current patch, so it's nice to
>> be specific to avoid confusion.
>>
>>> ACPI states to the  root ports under VMD. It assigns AER, hotplug,
>>> PME, etc. These have been restored back to the power on default state
>>> in guest OS, which says the root port hot plug enable is default OFF.
>>> At most, the work around should have only assigned AER state.
>>> An additional patch should be added to exclude hot plug from the
>>> original patch.
>>
>> Add blank line between paragraphs.
>>
>>> This will cause hot plug to start working again in the guest, as the
>>> settings implemented by the UEFI VMD DXE driver will remain in effect
>>> in Linux.
> So hotplug will work in the guest per UEFI VMD DXE driver, but AER will be determined by the host native_aer state?

Yes. Guest doesn't have UEFI driver to configure rootports as in case of Host.

>
>>
>> This is a lot of description that doesn't seem to say what the actual
>> underlying issue is.  It's basically "if we treat hotplug differently
>> from other negotiated features, things work better."  And it seems
>> like it depends on what the UEFI driver did, and we should try to
>> avoid a dependency there.
>>
>> If the issue is too many Correctable Errors being reported via AER, we
>> have that problem regardless of VMD, and we should handle it by
>> rate-limiting and/or suppressing them completely for particularly
>> offensive devices.  We have some issues like this that are still
>> outstanding:
> I'm curious if Non-VMD root ports on the system see the same CE storm.
> Either way, rate-limiting per device seems like a good idea.

AER is not the issue here and we can leave them as it is.
I added unnecessary description which caused more confusion.
My apologies.

>
>>
>> [1] https://lore.kernel.org/linux-pci/DM6PR04MB6473197DBD89FF4643CC391F8BC19@DM6PR04MB6473.namprd04.prod.outlook.com/
>> [2] https://lore.kernel.org/linux-pci/20230606035256.2886098-2-grundler@chromium.org/
>>
>>> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
>>> ---
>>>   drivers/pci/controller/vmd.c | 2 --
>>>   1 file changed, 2 deletions(-)
>>>
>>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
>>> index 769eedeb8802..52c2461b4761 100644
>>> --- a/drivers/pci/controller/vmd.c
>>> +++ b/drivers/pci/controller/vmd.c
>>> @@ -701,8 +701,6 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>>>   static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>>>                          struct pci_host_bridge *vmd_bridge)
>>>   {
>>> -    vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
>>> -    vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
>>>       vmd_bridge->native_aer = root_bridge->native_aer;
>>>       vmd_bridge->native_pme = root_bridge->native_pme;
>>>       vmd_bridge->native_ltr = root_bridge->native_ltr;
>>> -- 
>>> 2.31.1
>>>
diff mbox series

Patch

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 769eedeb8802..52c2461b4761 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -701,8 +701,6 @@  static int vmd_alloc_irqs(struct vmd_dev *vmd)
 static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
 				       struct pci_host_bridge *vmd_bridge)
 {
-	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
-	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
 	vmd_bridge->native_aer = root_bridge->native_aer;
 	vmd_bridge->native_pme = root_bridge->native_pme;
 	vmd_bridge->native_ltr = root_bridge->native_ltr;