diff mbox series

[v2,1/1] PCI: vmd: Avoid acceidental enablement of window when zeroing config space of VMD root ports

Message ID 20230111092911.8039-1-adrianhuang0701@gmail.com (mailing list archive)
State Changes Requested
Headers show
Series [v2,1/1] PCI: vmd: Avoid acceidental enablement of window when zeroing config space of VMD root ports | expand

Commit Message

Adrian Huang Jan. 11, 2023, 9:29 a.m. UTC
From: Adrian Huang <ahuang12@lenovo.com>

Commit 6aab5622296b ("PCI: vmd: Clean up domain before enumeration")
clears PCI configuration space of VMD root ports. However, the host OS
cannot boot successfully with the following error message:

  vmd 0000:64:05.5: PCI host bridge to bus 10000:00
  ...
  vmd 0000:64:05.5: Bound to PCI domain 10000
  ...
  DMAR: VT-d detected Invalidation Queue Error: Reason f
  DMAR: VT-d detected Invalidation Time-out Error: SID ffff
  DMAR: VT-d detected Invalidation Completion Error: SID ffff
  DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0
  DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0
  DMAR: Invalidation Time-out Error (ITE) cleared

The root cause is that memset_io() clears prefetchable memory base/limit
registers and prefetchable base/limit 32 bits registers sequentially. This
might enable prefetchable memory if the device disables prefetchable memory
originally. Here is an example (before memset_io()):

  PCI configuration space for 10000:00:00.0:
  86 80 30 20 06 00 10 00 04 00 04 06 00 00 01 00
  00 00 00 00 00 00 00 00 00 01 01 00 00 00 00 20
  00 00 00 00 01 00 01 00 ff ff ff ff 75 05 00 00
  00 00 00 00 40 00 00 00 00 00 00 00 00 01 02 00

So, prefetchable memory is ffffffff00000000-575000fffff, which is disabled.
Here is the quote from section 7.5.1.3.9 of PCI Express Base 6.0 spec:

  The Prefetchable Memory Limit register must be programmed to a smaller
  value than the Prefetchable Memory Base register if there is no
  prefetchable memory on the secondary side of the bridge.

When memset_io() clears prefetchable base 32 bits register, the
prefetchable memory becomes 0000000000000000-575000fffff, which is enabled.
This behavior (accidental enablement of window) causes that config accesses
get routed to the wrong place, and the access content of PCI configuration
space of VMD root ports is 0xff after invoking memset_io() in
vmd_domain_reset():

  10000:00:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev ff) (prog-if ff)
          !!! Unknown header type 7f
  00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ...
  f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

  10000:00:01.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port B (rev ff) (prog-if ff)
          !!! Unknown header type 7f
  00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ...
  f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

To fix the issue, prefetchable limit upper 32 bits register needs to be
cleared firstly. This also adheres to the implementation of
pci_setup_bridge_mmio_pref(). Please see the function for detail.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216644
Fixes: 6aab5622296b ("PCI: vmd: Clean up domain before enumeration")
Cc: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Reviewed-by: Jon Derrick <jonathan.derrick@linux.dev>
---
Changes since v1:
  - Changed subject per Bjorn's suggestion

 drivers/pci/controller/vmd.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Bjorn Helgaas Jan. 11, 2023, 3:58 p.m. UTC | #1
s/acceidental/accidental/ in subject

On Wed, Jan 11, 2023 at 05:29:11PM +0800, Adrian Huang wrote:
> From: Adrian Huang <ahuang12@lenovo.com>
> 
> Commit 6aab5622296b ("PCI: vmd: Clean up domain before enumeration")
> clears PCI configuration space of VMD root ports. However, the host OS
> cannot boot successfully with the following error message:
> 
>   vmd 0000:64:05.5: PCI host bridge to bus 10000:00
>   ...
>   vmd 0000:64:05.5: Bound to PCI domain 10000
>   ...
>   DMAR: VT-d detected Invalidation Queue Error: Reason f
>   DMAR: VT-d detected Invalidation Time-out Error: SID ffff
>   DMAR: VT-d detected Invalidation Completion Error: SID ffff
>   DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0
>   DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0
>   DMAR: Invalidation Time-out Error (ITE) cleared
> 
> The root cause is that memset_io() clears prefetchable memory base/limit
> registers and prefetchable base/limit 32 bits registers sequentially. This
> might enable prefetchable memory if the device disables prefetchable memory
> originally. Here is an example (before memset_io()):
> 
>   PCI configuration space for 10000:00:00.0:
>   86 80 30 20 06 00 10 00 04 00 04 06 00 00 01 00
>   00 00 00 00 00 00 00 00 00 01 01 00 00 00 00 20
>   00 00 00 00 01 00 01 00 ff ff ff ff 75 05 00 00
>   00 00 00 00 40 00 00 00 00 00 00 00 00 01 02 00
> 
> So, prefetchable memory is ffffffff00000000-575000fffff, which is disabled.
> Here is the quote from section 7.5.1.3.9 of PCI Express Base 6.0 spec:
> 
>   The Prefetchable Memory Limit register must be programmed to a smaller
>   value than the Prefetchable Memory Base register if there is no
>   prefetchable memory on the secondary side of the bridge.
> 
> When memset_io() clears prefetchable base 32 bits register, the
> prefetchable memory becomes 0000000000000000-575000fffff, which is enabled.
> This behavior (accidental enablement of window) causes that config accesses
> get routed to the wrong place, and the access content of PCI configuration
> space of VMD root ports is 0xff after invoking memset_io() in
> vmd_domain_reset():

I was thinking the problem was only between clearing
PCI_PREF_MEMORY_BASE and PCI_PREF_BASE_UPPER32, but that would be a
pretty small window, and you're seeing a lot of config accesses going
wrong.  Why is that?  Is there enumeration that races with this domain
reset?

I guess the same problem occurs with PCI_IO_BASE/PCI_IO_BASE_UPPER16,
but maybe there's no concurrent I/O port access?

>   10000:00:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev ff) (prog-if ff)
>           !!! Unknown header type 7f
>   00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>   ...
>   f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 
>   10000:00:01.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port B (rev ff) (prog-if ff)
>           !!! Unknown header type 7f
>   00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>   ...
>   f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 
> To fix the issue, prefetchable limit upper 32 bits register needs to be
> cleared firstly. This also adheres to the implementation of
> pci_setup_bridge_mmio_pref(). Please see the function for detail.
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216644
> Fixes: 6aab5622296b ("PCI: vmd: Clean up domain before enumeration")
> Cc: Nirmal Patel <nirmal.patel@linux.intel.com>
> Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
> Reviewed-by: Jon Derrick <jonathan.derrick@linux.dev>
> ---
> Changes since v1:
>   - Changed subject per Bjorn's suggestion
> 
>  drivers/pci/controller/vmd.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 769eedeb8802..e520aec55b68 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -526,6 +526,9 @@ static void vmd_domain_reset(struct vmd_dev *vmd)
>  				     PCI_CLASS_BRIDGE_PCI))
>  					continue;
>  
> +				/* Clear the upper 32 bits of PREF limit. */
> +				memset_io(base + PCI_PREF_LIMIT_UPPER32, 0, 4);
> +
>  				memset_io(base + PCI_IO_BASE, 0,
>  					  PCI_ROM_ADDRESS1 - PCI_IO_BASE);
>  			}
> -- 
> 2.31.1
>
Nirmal Patel Jan. 13, 2023, 6:13 p.m. UTC | #2
On 1/11/2023 2:29 AM, Adrian Huang wrote:
> From: Adrian Huang <ahuang12@lenovo.com>
>
> Commit 6aab5622296b ("PCI: vmd: Clean up domain before enumeration")
> clears PCI configuration space of VMD root ports. However, the host OS
> cannot boot successfully with the following error message:
>
>   vmd 0000:64:05.5: PCI host bridge to bus 10000:00
>   ...
>   vmd 0000:64:05.5: Bound to PCI domain 10000
>   ...
>   DMAR: VT-d detected Invalidation Queue Error: Reason f
>   DMAR: VT-d detected Invalidation Time-out Error: SID ffff
>   DMAR: VT-d detected Invalidation Completion Error: SID ffff
>   DMAR: QI HEAD: UNKNOWN qw0 = 0x0, qw1 = 0x0
>   DMAR: QI PRIOR: UNKNOWN qw0 = 0x0, qw1 = 0x0
>   DMAR: Invalidation Time-out Error (ITE) cleared
>
> The root cause is that memset_io() clears prefetchable memory base/limit
> registers and prefetchable base/limit 32 bits registers sequentially. This
> might enable prefetchable memory if the device disables prefetchable memory
> originally. Here is an example (before memset_io()):
>
>   PCI configuration space for 10000:00:00.0:
>   86 80 30 20 06 00 10 00 04 00 04 06 00 00 01 00
>   00 00 00 00 00 00 00 00 00 01 01 00 00 00 00 20
>   00 00 00 00 01 00 01 00 ff ff ff ff 75 05 00 00
>   00 00 00 00 40 00 00 00 00 00 00 00 00 01 02 00
>
> So, prefetchable memory is ffffffff00000000-575000fffff, which is disabled.
> Here is the quote from section 7.5.1.3.9 of PCI Express Base 6.0 spec:
>
>   The Prefetchable Memory Limit register must be programmed to a smaller
>   value than the Prefetchable Memory Base register if there is no
>   prefetchable memory on the secondary side of the bridge.
>
> When memset_io() clears prefetchable base 32 bits register, the
> prefetchable memory becomes 0000000000000000-575000fffff, which is enabled.
> This behavior (accidental enablement of window) causes that config accesses
> get routed to the wrong place, and the access content of PCI configuration
> space of VMD root ports is 0xff after invoking memset_io() in
> vmd_domain_reset():
>
>   10000:00:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev ff) (prog-if ff)
>           !!! Unknown header type 7f
>   00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>   ...
>   f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
>   10000:00:01.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port B (rev ff) (prog-if ff)
>           !!! Unknown header type 7f
>   00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>   ...
>   f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
> To fix the issue, prefetchable limit upper 32 bits register needs to be
> cleared firstly. This also adheres to the implementation of
> pci_setup_bridge_mmio_pref(). Please see the function for detail.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216644
> Fixes: 6aab5622296b ("PCI: vmd: Clean up domain before enumeration")
> Cc: Nirmal Patel <nirmal.patel@linux.intel.com>
> Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
> Reviewed-by: Jon Derrick <jonathan.derrick@linux.dev>
> ---
> Changes since v1:
>   - Changed subject per Bjorn's suggestion
>
>  drivers/pci/controller/vmd.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 769eedeb8802..e520aec55b68 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -526,6 +526,9 @@ static void vmd_domain_reset(struct vmd_dev *vmd)
>  				     PCI_CLASS_BRIDGE_PCI))
>  					continue;
>  
> +				/* Clear the upper 32 bits of PREF limit. */
> +				memset_io(base + PCI_PREF_LIMIT_UPPER32, 0, 4);
> +
>  				memset_io(base + PCI_IO_BASE, 0,
>  					  PCI_ROM_ADDRESS1 - PCI_IO_BASE);
>  			}

Reviewed-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Adrian Huang Jan. 17, 2023, 9:50 a.m. UTC | #3
On Wed, Jan 11, 2023 at 11:58 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> s/acceidental/accidental/ in subject

Thanks. Sorry for the fat-finger error.

> > When memset_io() clears prefetchable base 32 bits register, the
> > prefetchable memory becomes 0000000000000000-575000fffff, which is enabled.
> > This behavior (accidental enablement of window) causes that config accesses
> > get routed to the wrong place, and the access content of PCI configuration
> > space of VMD root ports is 0xff after invoking memset_io() in
> > vmd_domain_reset():
>
> I was thinking the problem was only between clearing
> PCI_PREF_MEMORY_BASE and PCI_PREF_BASE_UPPER32, but that would be a
> pretty small window, and you're seeing a lot of config accesses going
> wrong.  Why is that?  Is there enumeration that races with this domain
> reset?

Well, I didn't see the races. The problem is that: memset_io() uses
enhanced REP STOSB, fast-string operation or legacy method (see
arch/x86/lib/memset_64.S) to *sequentially* clear the memory location
from lower memory location to higher one. When clearing at
PCI_PREF_BASE_UPPER32, the prefetchable memory window is accidentally
enabled. The subsequent accesses (each read returns 0xff, and each
write does not take any effect) cannot be made correctly. In this
case, clearing at PCI_PREF_LIMIT_UPPER32 cannot take any effect. So,
we're unable to configure VMD devices anymore for subsequent writes.

Here are the test codes for emulating memset_io(), and they are 4-byte
writes. The issue can be reproduced:

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 769eedeb8802..e27e2a0f68e0 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -526,8 +526,15 @@ static void vmd_domain_reset(struct vmd_dev *vmd)
                                     PCI_CLASS_BRIDGE_PCI))
                                        continue;

-                               memset_io(base + PCI_IO_BASE, 0,
-                                         PCI_ROM_ADDRESS1 - PCI_IO_BASE);
+                               writel(0, base + PCI_IO_BASE);
+                               writel(0, base + PCI_MEMORY_BASE);
+                               writel(0, base + PCI_PREF_MEMORY_BASE);
+
+                               writel(0, base + PCI_PREF_BASE_UPPER32);
+                               writel(0, base + PCI_PREF_LIMIT_UPPER32);
+
+                               writel(0, base + PCI_IO_BASE_UPPER16);
+                               writel(0, base + PCI_CAPABILITY_LIST);
                        }
                }
        }


The following test codes can fix the issue (clear
PCI_PREF_LIMIT_UPPER32 firstly):

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 769eedeb8802..b9140e081793 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -526,8 +526,15 @@ static void vmd_domain_reset(struct vmd_dev *vmd)
                                     PCI_CLASS_BRIDGE_PCI))
                                        continue;

-                               memset_io(base + PCI_IO_BASE, 0,
-                                         PCI_ROM_ADDRESS1 - PCI_IO_BASE);
+                               writel(0, base + PCI_IO_BASE);
+                               writel(0, base + PCI_MEMORY_BASE);
+                               writel(0, base + PCI_PREF_MEMORY_BASE);
+
+                               writel(0, base + PCI_PREF_LIMIT_UPPER32);
+                               writel(0, base + PCI_PREF_BASE_UPPER32);
+
+                               writel(0, base + PCI_IO_BASE_UPPER16);
+                               writel(0, base + PCI_CAPABILITY_LIST);


> I guess the same problem occurs with PCI_IO_BASE/PCI_IO_BASE_UPPER16,
> but maybe there's no concurrent I/O port access?

For the general case, how about the following code snippets?

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 769eedeb8802..e29e33f7b70e 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -499,6 +499,19 @@ static inline void vmd_acpi_begin(void) { }
 static inline void vmd_acpi_end(void) { }
 #endif /* CONFIG_ACPI */

+static inline void vmd_clear_cfg_space(char __iomem *base, int start, int end)
+{
+       int i;
+
+       /*
+        * Clear PCI configuration space from higher memory location
+        * to lower one. This prevents from enabling IO window or
+        * prefetchable memory window if it is disabled initially.
+        */
+       for (i = end - 1; i >= start; i--)
+               writeb(0, base + i);
+}
+
 static void vmd_domain_reset(struct vmd_dev *vmd)
 {
        u16 bus, max_buses = resource_size(&vmd->resources[0]);
@@ -526,8 +539,8 @@ static void vmd_domain_reset(struct vmd_dev *vmd)
                                     PCI_CLASS_BRIDGE_PCI))
                                        continue;

-                               memset_io(base + PCI_IO_BASE, 0,
-                                         PCI_ROM_ADDRESS1 - PCI_IO_BASE);
+                               vmd_clear_cfg_space(base, PCI_IO_BASE,
+                                                       PCI_ROM_ADDRESS1);
                        }
                }
        }


-- Adrian
Bjorn Helgaas Jan. 17, 2023, 6:15 p.m. UTC | #4
On Tue, Jan 17, 2023 at 05:50:05PM +0800, Huang Adrian wrote:
> On Wed, Jan 11, 2023 at 11:58 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > When memset_io() clears prefetchable base 32 bits register, the
> > > prefetchable memory becomes 0000000000000000-575000fffff, which
> > > is enabled.  This behavior (accidental enablement of window)
> > > causes that config accesses get routed to the wrong place, and
> > > the access content of PCI configuration space of VMD root ports
> > > is 0xff after invoking memset_io() in vmd_domain_reset():
> >
> > I was thinking the problem was only between clearing
> > PCI_PREF_MEMORY_BASE and PCI_PREF_BASE_UPPER32, but that would be
> > a pretty small window, and you're seeing a lot of config accesses
> > going wrong.  Why is that?  Is there enumeration that races with
> > this domain reset?
> 
> Well, I didn't see the races. The problem is that: memset_io() uses
> enhanced REP STOSB, fast-string operation or legacy method (see
> arch/x86/lib/memset_64.S) to *sequentially* clear the memory
> location from lower memory location to higher one.

Obviously we can't *ever* clear both PCI_PREF_MEMORY_BASE and
PCI_PREF_BASE_UPPER32 atomically, whether it's via memset_io(),
writel(), or whatever.  I understand that.

> When clearing at PCI_PREF_BASE_UPPER32, the prefetchable memory
> window is accidentally enabled. The subsequent accesses (each read
> returns 0xff, and each write does not take any effect) cannot be
> made correctly. In this case, clearing at PCI_PREF_LIMIT_UPPER32
> cannot take any effect. So, we're unable to configure VMD devices
> anymore for subsequent writes.

I understand the mechanism that temporarily enables the window.  But I
don't understand the part about "clearing PCI_PREF_LIMIT_UPPER32
*cannot* take any effect."

Is it impossible to clear PCI_PREF_LIMIT_UPPER32 while the window is
enabled?  Given the normal PCI rules, I don't understand why that
would be.

This sequence is problematic because the window is accidentally
enabled:

  1)  clear PCI_PREF_MEMORY_BASE
  2)  <window is enabled here>
  3)  clear PCI_PREF_BASE_UPPER32

and the following sequence works as desired:

  clear PCI_PREF_BASE_UPPER32
  clear PCI_PREF_MEMORY_BASE

The interval between 1) and 3) above should be short: there are only a
few config writes between them.

But you're seeing DMAR VT-d config reads that fail.  Why are those
happening at the same time as VMD enumeration?  And apparently you can
also run lspci and see *those* config reads fail.

There has to be more going on here than a window that is accidentally
enabled for a few milliseconds.  *That* is my question.

Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 769eedeb8802..e520aec55b68 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -526,6 +526,9 @@  static void vmd_domain_reset(struct vmd_dev *vmd)
 				     PCI_CLASS_BRIDGE_PCI))
 					continue;
 
+				/* Clear the upper 32 bits of PREF limit. */
+				memset_io(base + PCI_PREF_LIMIT_UPPER32, 0, 4);
+
 				memset_io(base + PCI_IO_BASE, 0,
 					  PCI_ROM_ADDRESS1 - PCI_IO_BASE);
 			}