diff mbox series

PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

Message ID 20180831073057.14626-1-drake@endlessm.com (mailing list archive)
State Superseded, archived
Headers show
Series PCI: add prefetch quirk to work around Asus/Nvidia suspend issues | expand

Commit Message

Daniel Drake Aug. 31, 2018, 7:30 a.m. UTC
On over 40 Intel-based Asus products, the nvidia GPU becomes unusable
after S3 suspend/resume. The affected products include multiple
generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
many errors such as:

    fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04 [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
    DRM: failed to idle channel 0 [DRM]

Similarly, the nvidia proprietary driver also fails after resume
(black screen, 100% CPU usage in Xorg process). We shipped a sample
to Nvidia for diagnosis, and their response indicated that it's a
problem with the parent PCI bridge (on the Intel SoC), not the GPU.

We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register. In the cases that I checked,
this register has value 0 and we just have to rewrite that value.

It's very strange that rewriting the exact same register value
makes a difference, but it definitely makes the issue go away.
It's not just acting as some kind of memory barrier, because rewriting
other bridge registers does not work around the issue. There's something
magic in this particular register.

We examined our database of Asus hardware and identified 43 products
that we believe are affected. Checking the nvidia GPU parent PCI bridge
on each one, in total 5 Intel PCI bridges need quirking as below.
The quirk will run on bridges even where no nvidia GPU is connected,
but it should be harmless, and we at least limit it to only running
on Asus products.

This fix was tested on all the affected models that we have in hands
(X542UQ, UX533FD, X530UN, V272UN).

Signed-off-by: Daniel Drake <drake@endlessm.com>
---

Notes:
    If anyone has ideas for why writing this register makes a difference, or
    suggestions for other approaches then I'm all ears...
    
    Here is some basic info of the 43 products believed to be affected:
    basic DMI data, nvidia GPU PCI info, parent PCI bridge info.
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: FX502VD
    product_name: FX502VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev ff) (prog-if ff)
    	!!! Unknown header type 7f
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: FX570UD
    product_name: ASUS Gaming FX570UD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1f40]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: GL553VD
    product_name: GL553VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:15e0]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: GL553VD
    product_name: GL553VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:15e0]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: GL753VD
    product_name: GL753VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1590]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: GL753VD
    product_name: GL753VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1590]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: K401UQK
    product_name: K401UQK
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:14b0]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P1440UF
    product_name: ASUSPRO P1440UF
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:174d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1f10]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P2440UQ
    product_name: P2440UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:13ce]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P2540NV
    product_name: P2540NV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:17f0]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P2540NV
    product_name: P2540NV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:17f0]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P2540UV
    product_name: P2540UV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:132e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P4540UQ
    product_name: P4540UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1650]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX331UN
    product_name: UX331UN
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d12] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:15de]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX410UQK
    product_name: UX410UQK
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:138e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX430UQ
    product_name: UX430UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:139e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX430UQ
    product_name: UX430UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:139e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX533FD
    product_name: ZenBook UX533FD_UX533FD
    02:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:14a1]
    00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:9dbc] (rev f0) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: V221ID
    product_name: V221ID
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:15f0]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: V272UN
    product_name: Vivo AIO 27 V272UN
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d10] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:17be]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X430UN
    product_name: VivoBook S14 X430UN
    01:00.0 3D controller [0302]: NVIDIA Corporation GP108M [GeForce MX150] [10de:1d10] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP108M [GeForce MX150] [1043:199e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X441MB
    product_name: X441MB
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:174e] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:171e]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:31d8] (rev f3) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X456UF
    product_name: X456UF
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1346] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:245a]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X510UQ
    product_name: X510UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:145e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X530UN
    product_name: VivoBook S15 X530UN
    01:00.0 3D controller [0302]: NVIDIA Corporation GP108M [GeForce MX150] [10de:1d10] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP108M [GeForce MX150] [1043:18ce]
    00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X541UV
    product_name: X541UV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:11ee]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X542UN
    product_name: X542UN
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d10] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1b10]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X542UQ
    product_name: X542UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation GM108M [GeForce 940MX] [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. GM108M [GeForce 940MX] [1043:142e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X555UB
    product_name: X555UB
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1347] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:246a]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X555UQ
    product_name: X555UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation GM108M [GeForce 940MX] [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. GM108M [GeForce 940MX] [1043:246a]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X556URK
    product_name: X556URK
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134e] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1490]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X570ZD
    product_name: VivoBook_ASUS Laptop X570ZD
    01:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:11d1]
    00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d3] (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X580GD
    product_name: VivoBook_ASUSLaptop X580GD_X580GD
    01:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:1fc0]
    00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 07) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X580VD
    product_name: X580VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1a10]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X580VD
    product_name: X580VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev ff) (prog-if ff)
    	!!! Unknown header type 7f
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X705FD
    product_name: VivoBook Pro 17 X705FD_X705FD
    02:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:1431]
    00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:9dbc] (rev f0) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X705UD
    product_name: X705UD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1b30]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X705UQ
    product_name: X705UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:148e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X751NV
    product_name: X751NV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:13be]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: Z240IE
    product_name: Z240IE
    01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1c8d] (rev a1) (prog-if 00 [VGA controller])
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1750]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: ZN220IC-K
    product_name: ZN220IC-K
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134e] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:117e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: ZN241IC
    product_name: ZN241IC
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1900]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: ZN270IE
    product_name: ZN270IE
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1720]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])

 drivers/pci/quirks.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Comments

Bjorn Helgaas Aug. 31, 2018, 7:12 p.m. UTC | #1
[+cc Intel folks]

On Fri, Aug 31, 2018 at 03:30:57PM +0800, Daniel Drake wrote:
> On over 40 Intel-based Asus products, the nvidia GPU becomes unusable
> after S3 suspend/resume. The affected products include multiple
> generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> many errors such as:
> 
>     fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04 [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
>     DRM: failed to idle channel 0 [DRM]
> 
> Similarly, the nvidia proprietary driver also fails after resume
> (black screen, 100% CPU usage in Xorg process). We shipped a sample
> to Nvidia for diagnosis, and their response indicated that it's a
> problem with the parent PCI bridge (on the Intel SoC), not the GPU.
> 
> We found a workaround: on resume, rewrite the Intel PCI bridge
> 'Prefetchable Base Upper 32 Bits' register. In the cases that I checked,
> this register has value 0 and we just have to rewrite that value.
> 
> It's very strange that rewriting the exact same register value
> makes a difference, but it definitely makes the issue go away.
> It's not just acting as some kind of memory barrier, because rewriting
> other bridge registers does not work around the issue. There's something
> magic in this particular register.

If true, this sounds like some sort of erratum, so it would be good to
get some input from Intel, and I cc'd a few Intel folks.

It's interesting that all the systems below are from Asus.  That makes
me think there's some BIOS or SMM connection, e.g., SMM traps the
register write and does something magic.

Does this problem happen after a full system suspend/resume, or does
it happen after runtime suspend of only the GPU?  Or runtime suspend
of only the GPU and the upstream bridge?

Can we tell whether Windows rewrites this register unconditionally at
resume-time?  If so, it may be more robust for Linux to do the same.
The whole thing is black magic, which I hate, but if it's our only
choice, it may be better to have this applied everywhere so we don't
keep stubbing our toes on new systems that require the quirk.

> We examined our database of Asus hardware and identified 43 products
> that we believe are affected. Checking the nvidia GPU parent PCI bridge
> on each one, in total 5 Intel PCI bridges need quirking as below.
> The quirk will run on bridges even where no nvidia GPU is connected,
> but it should be harmless, and we at least limit it to only running
> on Asus products.
> 
> This fix was tested on all the affected models that we have in hands
> (X542UQ, UX533FD, X530UN, V272UN).
> 
> Signed-off-by: Daniel Drake <drake@endlessm.com>
> ---
> 
> Notes:
>     If anyone has ideas for why writing this register makes a difference, or
>     suggestions for other approaches then I'm all ears...
>     
>     Here is some basic info of the 43 products believed to be affected:
>     basic DMI data, nvidia GPU PCI info, parent PCI bridge info.

Can you attach the list below to a kernel.org bugzilla and include the
URL in your changelog?

>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: FX502VD
>     product_name: FX502VD
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev ff) (prog-if ff)
>     	!!! Unknown header type 7f
>     00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: FX570UD
>     product_name: ASUS Gaming FX570UD
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:1f40]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: GL553VD
>     product_name: GL553VD
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:15e0]
>     00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: GL553VD
>     product_name: GL553VD
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:15e0]
>     00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: GL753VD
>     product_name: GL753VD
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:1590]
>     00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: GL753VD
>     product_name: GL753VD
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:1590]
>     00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: K401UQK
>     product_name: K401UQK
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:14b0]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: P1440UF
>     product_name: ASUSPRO P1440UF
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:174d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:1f10]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: P2440UQ
>     product_name: P2440UQ
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:13ce]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: P2540NV
>     product_name: P2540NV
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:17f0]
>     00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: P2540NV
>     product_name: P2540NV
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:17f0]
>     00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: P2540UV
>     product_name: P2540UV
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:132e]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: P4540UQ
>     product_name: P4540UQ
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:1650]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: UX331UN
>     product_name: UX331UN
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d12] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:15de]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: UX410UQK
>     product_name: UX410UQK
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:138e]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: UX430UQ
>     product_name: UX430UQ
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:139e]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: UX430UQ
>     product_name: UX430UQ
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:139e]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: UX533FD
>     product_name: ZenBook UX533FD_UX533FD
>     02:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:14a1]
>     00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:9dbc] (rev f0) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: V221ID
>     product_name: V221ID
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:15f0]
>     00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: V272UN
>     product_name: Vivo AIO 27 V272UN
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d10] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:17be]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X430UN
>     product_name: VivoBook S14 X430UN
>     01:00.0 3D controller [0302]: NVIDIA Corporation GP108M [GeForce MX150] [10de:1d10] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. GP108M [GeForce MX150] [1043:199e]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X441MB
>     product_name: X441MB
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:174e] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:171e]
>     00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:31d8] (rev f3) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X456UF
>     product_name: X456UF
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1346] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:245a]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X510UQ
>     product_name: X510UQ
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:145e]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X530UN
>     product_name: VivoBook S15 X530UN
>     01:00.0 3D controller [0302]: NVIDIA Corporation GP108M [GeForce MX150] [10de:1d10] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. GP108M [GeForce MX150] [1043:18ce]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X541UV
>     product_name: X541UV
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:11ee]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X542UN
>     product_name: X542UN
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d10] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:1b10]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X542UQ
>     product_name: X542UQ
>     01:00.0 3D controller [0302]: NVIDIA Corporation GM108M [GeForce 940MX] [10de:134d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. GM108M [GeForce 940MX] [1043:142e]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X555UB
>     product_name: X555UB
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1347] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:246a]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X555UQ
>     product_name: X555UQ
>     01:00.0 3D controller [0302]: NVIDIA Corporation GM108M [GeForce 940MX] [10de:134d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. GM108M [GeForce 940MX] [1043:246a]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X556URK
>     product_name: X556URK
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134e] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:1490]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X570ZD
>     product_name: VivoBook_ASUS Laptop X570ZD
>     01:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:11d1]
>     00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d3] (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X580GD
>     product_name: VivoBook_ASUSLaptop X580GD_X580GD
>     01:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:1fc0]
>     00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 07) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X580VD
>     product_name: X580VD
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:1a10]
>     00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X580VD
>     product_name: X580VD
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev ff) (prog-if ff)
>     	!!! Unknown header type 7f
>     00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X705FD
>     product_name: VivoBook Pro 17 X705FD_X705FD
>     02:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:1431]
>     00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:9dbc] (rev f0) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X705UD
>     product_name: X705UD
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:1b30]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X705UQ
>     product_name: X705UQ
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:148e]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: X751NV
>     product_name: X751NV
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:13be]
>     00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: Z240IE
>     product_name: Z240IE
>     01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1c8d] (rev a1) (prog-if 00 [VGA controller])
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:1750]
>     00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: ZN220IC-K
>     product_name: ZN220IC-K
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134e] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:117e]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: ZN241IC
>     product_name: ZN241IC
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:1900]
>     00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: ZN270IE
>     product_name: ZN270IE
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
>     	Subsystem: ASUSTeK Computer Inc. Device [1043:1720]
>     00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
> 
>  drivers/pci/quirks.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index ef7143a274e0..e0d956ee459c 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5119,3 +5119,26 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8575,
>  			quirk_switchtec_ntb_dma_alias);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8576,
>  			quirk_switchtec_ntb_dma_alias);
> +
> +/*
> + * The Nvidia GPU on many Intel-based Asus products is unusable after
> + * S3 resume. However, for unknown reasons, rewriting the value of register
> + * 'Prefetchable Base Upper 32 Bits' on the parent PCI bridge works around
> + * the issue.
> + */
> +static void quirk_asus_pci_prefetch(struct pci_dev *bridge)
> +{
> +	const char *sys_vendor = dmi_get_system_info(DMI_SYS_VENDOR);
> +	u32 value;
> +
> +	if (strcmp(sys_vendor, "ASUSTeK COMPUTER INC.") != 0)
> +		return;
> +
> +	pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &value);
> +	pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, value);
> +}
> +DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x1901, quirk_asus_pci_prefetch);
> +DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x31d8, quirk_asus_pci_prefetch);
> +DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad8, quirk_asus_pci_prefetch);
> +DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_asus_pci_prefetch);
> +DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9dbc, quirk_asus_pci_prefetch);
> -- 
> 2.17.1
>
kernel test robot Aug. 31, 2018, 9:47 p.m. UTC | #2
Hi Daniel,

I love your patch! Perhaps something to improve:

[auto build test WARNING on pci/next]
[also build test WARNING on v4.19-rc1 next-20180831]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Daniel-Drake/PCI-add-prefetch-quirk-to-work-around-Asus-Nvidia-suspend-issues/20180901-043245
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: x86_64-randconfig-x000-201834 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/pci/quirks.c: In function 'quirk_asus_pci_prefetch':
>> drivers/pci/quirks.c:5134:6: warning: argument 1 null where non-null expected [-Wnonnull]
     if (strcmp(sys_vendor, "ASUSTeK COMPUTER INC.") != 0)
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   In file included from include/linux/uuid.h:20:0,
                    from include/linux/mod_devicetable.h:13,
                    from include/linux/pci.h:21,
                    from drivers/pci/quirks.c:18:
   include/linux/string.h:44:12: note: in a call to function 'strcmp' declared here
    extern int strcmp(const char *,const char *);
               ^~~~~~

vim +5134 drivers/pci/quirks.c

  4983	
  4984	/*
  4985	 * Microsemi Switchtec NTB uses devfn proxy IDs to move TLPs between
  4986	 * NT endpoints via the internal switch fabric. These IDs replace the
  4987	 * originating requestor ID TLPs which access host memory on peer NTB
  4988	 * ports. Therefore, all proxy IDs must be aliased to the NTB device
  4989	 * to permit access when the IOMMU is turned on.
  4990	 */
  4991	static void quirk_switchtec_ntb_dma_alias(struct pci_dev *pdev)
  4992	{
  4993		void __iomem *mmio;
  4994		struct ntb_info_regs __iomem *mmio_ntb;
  4995		struct ntb_ctrl_regs __iomem *mmio_ctrl;
  4996		struct sys_info_regs __iomem *mmio_sys_info;
  4997		u64 partition_map;
  4998		u8 partition;
  4999		int pp;
  5000	
  5001		if (pci_enable_device(pdev)) {
  5002			pci_err(pdev, "Cannot enable Switchtec device\n");
  5003			return;
  5004		}
  5005	
  5006		mmio = pci_iomap(pdev, 0, 0);
  5007		if (mmio == NULL) {
  5008			pci_disable_device(pdev);
  5009			pci_err(pdev, "Cannot iomap Switchtec device\n");
  5010			return;
  5011		}
  5012	
  5013		pci_info(pdev, "Setting Switchtec proxy ID aliases\n");
  5014	
  5015		mmio_ntb = mmio + SWITCHTEC_GAS_NTB_OFFSET;
  5016		mmio_ctrl = (void __iomem *) mmio_ntb + SWITCHTEC_NTB_REG_CTRL_OFFSET;
  5017		mmio_sys_info = mmio + SWITCHTEC_GAS_SYS_INFO_OFFSET;
  5018	
  5019		partition = ioread8(&mmio_ntb->partition_id);
  5020	
  5021		partition_map = ioread32(&mmio_ntb->ep_map);
  5022		partition_map |= ((u64) ioread32(&mmio_ntb->ep_map + 4)) << 32;
  5023		partition_map &= ~(1ULL << partition);
  5024	
  5025		for (pp = 0; pp < (sizeof(partition_map) * 8); pp++) {
  5026			struct ntb_ctrl_regs __iomem *mmio_peer_ctrl;
  5027			u32 table_sz = 0;
  5028			int te;
  5029	
  5030			if (!(partition_map & (1ULL << pp)))
  5031				continue;
  5032	
  5033			pci_dbg(pdev, "Processing partition %d\n", pp);
  5034	
  5035			mmio_peer_ctrl = &mmio_ctrl[pp];
  5036	
  5037			table_sz = ioread16(&mmio_peer_ctrl->req_id_table_size);
  5038			if (!table_sz) {
  5039				pci_warn(pdev, "Partition %d table_sz 0\n", pp);
  5040				continue;
  5041			}
  5042	
  5043			if (table_sz > 512) {
  5044				pci_warn(pdev,
  5045					 "Invalid Switchtec partition %d table_sz %d\n",
  5046					 pp, table_sz);
  5047				continue;
  5048			}
  5049	
  5050			for (te = 0; te < table_sz; te++) {
  5051				u32 rid_entry;
  5052				u8 devfn;
  5053	
  5054				rid_entry = ioread32(&mmio_peer_ctrl->req_id_table[te]);
  5055				devfn = (rid_entry >> 1) & 0xFF;
  5056				pci_dbg(pdev,
  5057					"Aliasing Partition %d Proxy ID %02x.%d\n",
  5058					pp, PCI_SLOT(devfn), PCI_FUNC(devfn));
  5059				pci_add_dma_alias(pdev, devfn);
  5060			}
  5061		}
  5062	
  5063		pci_iounmap(pdev, mmio);
  5064		pci_disable_device(pdev);
  5065	}
  5066	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8531,
  5067				quirk_switchtec_ntb_dma_alias);
  5068	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8532,
  5069				quirk_switchtec_ntb_dma_alias);
  5070	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8533,
  5071				quirk_switchtec_ntb_dma_alias);
  5072	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8534,
  5073				quirk_switchtec_ntb_dma_alias);
  5074	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8535,
  5075				quirk_switchtec_ntb_dma_alias);
  5076	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8536,
  5077				quirk_switchtec_ntb_dma_alias);
  5078	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8543,
  5079				quirk_switchtec_ntb_dma_alias);
  5080	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8544,
  5081				quirk_switchtec_ntb_dma_alias);
  5082	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8545,
  5083				quirk_switchtec_ntb_dma_alias);
  5084	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8546,
  5085				quirk_switchtec_ntb_dma_alias);
  5086	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8551,
  5087				quirk_switchtec_ntb_dma_alias);
  5088	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8552,
  5089				quirk_switchtec_ntb_dma_alias);
  5090	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8553,
  5091				quirk_switchtec_ntb_dma_alias);
  5092	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8554,
  5093				quirk_switchtec_ntb_dma_alias);
  5094	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8555,
  5095				quirk_switchtec_ntb_dma_alias);
  5096	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8556,
  5097				quirk_switchtec_ntb_dma_alias);
  5098	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8561,
  5099				quirk_switchtec_ntb_dma_alias);
  5100	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8562,
  5101				quirk_switchtec_ntb_dma_alias);
  5102	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8563,
  5103				quirk_switchtec_ntb_dma_alias);
  5104	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8564,
  5105				quirk_switchtec_ntb_dma_alias);
  5106	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8565,
  5107				quirk_switchtec_ntb_dma_alias);
  5108	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8566,
  5109				quirk_switchtec_ntb_dma_alias);
  5110	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8571,
  5111				quirk_switchtec_ntb_dma_alias);
  5112	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8572,
  5113				quirk_switchtec_ntb_dma_alias);
  5114	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8573,
  5115				quirk_switchtec_ntb_dma_alias);
  5116	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8574,
  5117				quirk_switchtec_ntb_dma_alias);
  5118	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8575,
  5119				quirk_switchtec_ntb_dma_alias);
  5120	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8576,
  5121				quirk_switchtec_ntb_dma_alias);
  5122	
  5123	/*
  5124	 * The Nvidia GPU on many Intel-based Asus products is unusable after
  5125	 * S3 resume. However, for unknown reasons, rewriting the value of register
  5126	 * 'Prefetchable Base Upper 32 Bits' on the parent PCI bridge works around
  5127	 * the issue.
  5128	 */
  5129	static void quirk_asus_pci_prefetch(struct pci_dev *bridge)
  5130	{
  5131		const char *sys_vendor = dmi_get_system_info(DMI_SYS_VENDOR);
  5132		u32 value;
  5133	
> 5134		if (strcmp(sys_vendor, "ASUSTeK COMPUTER INC.") != 0)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Daniel Drake Sept. 3, 2018, 8:56 a.m. UTC | #3
On Sat, Sep 1, 2018 at 3:12 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> If true, this sounds like some sort of erratum, so it would be good to
> get some input from Intel, and I cc'd a few Intel folks.

Yes, it would be great to get their input.

> It's interesting that all the systems below are from Asus.  That makes
> me think there's some BIOS or SMM connection, e.g., SMM traps the
> register write and does something magic.

Is there a way I can check if there is a SMM trap active for this address?

> Does this problem happen after a full system suspend/resume, or does
> it happen after runtime suspend of only the GPU?  Or runtime suspend
> of only the GPU and the upstream bridge?

runtime suspend/resume works fine. It only happens after S3 suspend.

> Can we tell whether Windows rewrites this register unconditionally at
> resume-time?  If so, it may be more robust for Linux to do the same.
> The whole thing is black magic, which I hate, but if it's our only
> choice, it may be better to have this applied everywhere so we don't
> keep stubbing our toes on new systems that require the quirk.

Any suggestions for how to make this happen? Booting windows in
virt-manager (hoping that I could then spy on PCI config space reg
accesses), I don't see an option for S3 suspend, but I'll keep looking
into this.

Thanks
Daniel
Mika Westerberg Sept. 3, 2018, 12:12 p.m. UTC | #4
On Mon, Sep 03, 2018 at 04:56:32PM +0800, Daniel Drake wrote:
> On Sat, Sep 1, 2018 at 3:12 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > If true, this sounds like some sort of erratum, so it would be good to
> > get some input from Intel, and I cc'd a few Intel folks.
> 
> Yes, it would be great to get their input.

We have seen one similar issue with LPSS devices when BIOS assigns
device BARs above 4G (which is not the case here) and it turned out to
be misconfigured MTRR register or something like that. It may not be
related at all but it could be worth a try to dump out MTRR registers of
one of the affected systems and see if the memory areas are listed there
(and if the attributes are somehow wrong if found).
Daniel Drake Sept. 4, 2018, 1:52 a.m. UTC | #5
On Mon, Sep 3, 2018 at 8:12 PM, Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
> We have seen one similar issue with LPSS devices when BIOS assigns
> device BARs above 4G (which is not the case here) and it turned out to
> be misconfigured MTRR register or something like that. It may not be
> related at all but it could be worth a try to dump out MTRR registers of
> one of the affected systems and see if the memory areas are listed there
> (and if the attributes are somehow wrong if found).

From Asus X542UQ:

# cat /proc/mtrr
reg00: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable
reg01: base=0x0a0000000 ( 2560MB), size=  512MB, count=1: uncachable
reg02: base=0x090000000 ( 2304MB), size=  256MB, count=1: uncachable
reg03: base=0x08c000000 ( 2240MB), size=   64MB, count=1: uncachable
reg04: base=0x08b800000 ( 2232MB), size=    8MB, count=1: uncachable

# cat /sys/kernel/debug/x86/pat_memtype_list
PAT memtype list:
write-back @ 0x84a23000-0x84a24000
write-back @ 0x8ad34000-0x8ad60000
write-back @ 0x8ad5f000-0x8ad66000
write-back @ 0x8ad5f000-0x8ad60000
write-back @ 0x8ad65000-0x8ad6a000
write-back @ 0x8ad69000-0x8ad6b000
write-back @ 0x8ad6a000-0x8ad6c000
write-back @ 0x8ad6b000-0x8ad6e000
write-back @ 0x8ad9c000-0x8ad9d000
write-back @ 0x8adce000-0x8adcf000
write-back @ 0x8adcf000-0x8add0000
write-back @ 0x8adcf000-0x8add2000
write-back @ 0x8add3000-0x8add4000
write-back @ 0x8ae04000-0x8ae05000
write-back @ 0x8b208000-0x8b209000
write-combining @ 0xc0000000-0xd0000000
write-combining @ 0xd0000000-0xe0000000
write-combining @ 0xe0000000-0xe0040000
write-combining @ 0xe0040000-0xe0050000
write-combining @ 0xe0050000-0xe0051000
write-combining @ 0xe0051000-0xe0151000
write-combining @ 0xe0151000-0xe0191000
write-combining @ 0xe0191000-0xe01a1000
write-combining @ 0xe01a1000-0xe01b1000
write-combining @ 0xe01b1000-0xe01c1000
write-combining @ 0xe01c1000-0xe01c3000
write-combining @ 0xe01c3000-0xe01c5000
write-combining @ 0xe01c5000-0xe01cd000
write-combining @ 0xe01cd000-0xe01d5000
write-combining @ 0xe01d5000-0xe01dd000
write-combining @ 0xe01dd000-0xe01e5000
write-combining @ 0xe01e5000-0xe01ed000
write-combining @ 0xe01ed000-0xe01f5000
write-combining @ 0xe01f5000-0xe01fd000
write-combining @ 0xe01fd000-0xe0205000
write-combining @ 0xe0205000-0xe020d000
write-combining @ 0xe020d000-0xe0215000
uncached-minus @ 0xed000000-0xed200000
write-combining @ 0xed800000-0xee000000
uncached-minus @ 0xee000000-0xef000000
uncached-minus @ 0xef200000-0xef400000
uncached-minus @ 0xef400000-0xef401000
uncached-minus @ 0xef404000-0xef405000
uncached-minus @ 0xef510000-0xef520000
uncached-minus @ 0xef528000-0xef52c000
uncached-minus @ 0xef533000-0xef534000
uncached-minus @ 0xef533000-0xef534000
uncached-minus @ 0xef533000-0xef534000
uncached-minus @ 0xef534000-0xef535000
uncached-minus @ 0xef534000-0xef535000
uncached-minus @ 0xef534000-0xef535000
uncached-minus @ 0xef535000-0xef536000
uncached-minus @ 0xef537000-0xef538000
uncached-minus @ 0xef538000-0xef539000
uncached-minus @ 0xef538000-0xef539000
uncached-minus @ 0xef538000-0xef539000
uncached-minus @ 0xef539000-0xef53a000
uncached-minus @ 0xef539000-0xef53a000
uncached-minus @ 0xef539000-0xef53a000
uncached-minus @ 0xef53a000-0xef53b000
uncached-minus @ 0xf0000000-0xf8000000
uncached-minus @ 0xf00e0000-0xf00e1000
uncached-minus @ 0xf0100000-0xf0101000
uncached-minus @ 0xf0101000-0xf0102000
uncached-minus @ 0xfdac0000-0xfdad0000
uncached-minus @ 0xfdae0000-0xfdaf0000
uncached-minus @ 0xfdaf0000-0xfdb00000
uncached-minus @ 0xfdc43000-0xfdc44000
uncached-minus @ 0xfe000000-0xfe001000
uncached-minus @ 0xfe000000-0xfe001000
uncached-minus @ 0xfed00000-0xfed01000
uncached-minus @ 0xfed15000-0xfed16000
uncached-minus @ 0xfed40000-0xfed41000
uncached-minus @ 0xfed90000-0xfed91000
uncached-minus @ 0xfed91000-0xfed92000

Is that the info you were looking for?

Thanks
Daniel
Mika Westerberg Sept. 4, 2018, 6:43 a.m. UTC | #6
On Tue, Sep 04, 2018 at 09:52:02AM +0800, Daniel Drake wrote:
> # cat /proc/mtrr
> reg00: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable
> reg01: base=0x0a0000000 ( 2560MB), size=  512MB, count=1: uncachable
> reg02: base=0x090000000 ( 2304MB), size=  256MB, count=1: uncachable
> reg03: base=0x08c000000 ( 2240MB), size=   64MB, count=1: uncachable
> reg04: base=0x08b800000 ( 2232MB), size=    8MB, count=1: uncachable
> 
> # cat /sys/kernel/debug/x86/pat_memtype_list
> PAT memtype list:
> write-back @ 0x84a23000-0x84a24000
> write-back @ 0x8ad34000-0x8ad60000
> write-back @ 0x8ad5f000-0x8ad66000
> write-back @ 0x8ad5f000-0x8ad60000
> write-back @ 0x8ad65000-0x8ad6a000
> write-back @ 0x8ad69000-0x8ad6b000
> write-back @ 0x8ad6a000-0x8ad6c000
> write-back @ 0x8ad6b000-0x8ad6e000
> write-back @ 0x8ad9c000-0x8ad9d000
> write-back @ 0x8adce000-0x8adcf000
> write-back @ 0x8adcf000-0x8add0000
> write-back @ 0x8adcf000-0x8add2000
> write-back @ 0x8add3000-0x8add4000
> write-back @ 0x8ae04000-0x8ae05000
> write-back @ 0x8b208000-0x8b209000
> write-combining @ 0xc0000000-0xd0000000
> write-combining @ 0xd0000000-0xe0000000
> write-combining @ 0xe0000000-0xe0040000
> write-combining @ 0xe0040000-0xe0050000
> write-combining @ 0xe0050000-0xe0051000
> write-combining @ 0xe0051000-0xe0151000
> write-combining @ 0xe0151000-0xe0191000
> write-combining @ 0xe0191000-0xe01a1000
> write-combining @ 0xe01a1000-0xe01b1000
> write-combining @ 0xe01b1000-0xe01c1000
> write-combining @ 0xe01c1000-0xe01c3000
> write-combining @ 0xe01c3000-0xe01c5000
> write-combining @ 0xe01c5000-0xe01cd000
> write-combining @ 0xe01cd000-0xe01d5000
> write-combining @ 0xe01d5000-0xe01dd000
> write-combining @ 0xe01dd000-0xe01e5000
> write-combining @ 0xe01e5000-0xe01ed000
> write-combining @ 0xe01ed000-0xe01f5000
> write-combining @ 0xe01f5000-0xe01fd000
> write-combining @ 0xe01fd000-0xe0205000
> write-combining @ 0xe0205000-0xe020d000
> write-combining @ 0xe020d000-0xe0215000
> uncached-minus @ 0xed000000-0xed200000
> write-combining @ 0xed800000-0xee000000
> uncached-minus @ 0xee000000-0xef000000
> uncached-minus @ 0xef200000-0xef400000
> uncached-minus @ 0xef400000-0xef401000
> uncached-minus @ 0xef404000-0xef405000
> uncached-minus @ 0xef510000-0xef520000
> uncached-minus @ 0xef528000-0xef52c000
> uncached-minus @ 0xef533000-0xef534000
> uncached-minus @ 0xef533000-0xef534000
> uncached-minus @ 0xef533000-0xef534000
> uncached-minus @ 0xef534000-0xef535000
> uncached-minus @ 0xef534000-0xef535000
> uncached-minus @ 0xef534000-0xef535000
> uncached-minus @ 0xef535000-0xef536000
> uncached-minus @ 0xef537000-0xef538000
> uncached-minus @ 0xef538000-0xef539000
> uncached-minus @ 0xef538000-0xef539000
> uncached-minus @ 0xef538000-0xef539000
> uncached-minus @ 0xef539000-0xef53a000
> uncached-minus @ 0xef539000-0xef53a000
> uncached-minus @ 0xef539000-0xef53a000
> uncached-minus @ 0xef53a000-0xef53b000
> uncached-minus @ 0xf0000000-0xf8000000
> uncached-minus @ 0xf00e0000-0xf00e1000
> uncached-minus @ 0xf0100000-0xf0101000
> uncached-minus @ 0xf0101000-0xf0102000
> uncached-minus @ 0xfdac0000-0xfdad0000
> uncached-minus @ 0xfdae0000-0xfdaf0000
> uncached-minus @ 0xfdaf0000-0xfdb00000
> uncached-minus @ 0xfdc43000-0xfdc44000
> uncached-minus @ 0xfe000000-0xfe001000
> uncached-minus @ 0xfe000000-0xfe001000
> uncached-minus @ 0xfed00000-0xfed01000
> uncached-minus @ 0xfed15000-0xfed16000
> uncached-minus @ 0xfed40000-0xfed41000
> uncached-minus @ 0xfed90000-0xfed91000
> uncached-minus @ 0xfed91000-0xfed92000
> 
> Is that the info you were looking for?

Yes, can you check if the failing device BAR is included in any of the
above entries? If not then it is probably not related.
Daniel Drake Sept. 4, 2018, 7:07 a.m. UTC | #7
On Tue, Sep 4, 2018 at 2:43 PM, Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
> Yes, can you check if the failing device BAR is included in any of the
> above entries? If not then it is probably not related.

mtrr again for reference:
reg00: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable
reg01: base=0x0a0000000 ( 2560MB), size=  512MB, count=1: uncachable
reg02: base=0x090000000 ( 2304MB), size=  256MB, count=1: uncachable
reg03: base=0x08c000000 ( 2240MB), size=   64MB, count=1: uncachable
reg04: base=0x08b800000 ( 2232MB), size=    8MB, count=1: uncachable


The PCI bridge is:
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express
Root Port (rev f1) (prog-if 00 [Normal decode])
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 122
    Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
    I/O behind bridge: 0000e000-0000efff
    Memory behind bridge: ee000000-ef0fffff
    Prefetchable memory behind bridge: 00000000d0000000-00000000e1ffffff

The memory behind bridge at ee000000 is included in the mtrr region
reg00 which is 0xc0000000 to 0xffffffff.
Same for the prefetchable memory behind bridge.


The nvidia GPU which becomes unresponsive is:

01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)
    Subsystem: ASUSTeK Computer Inc. GM108M [GeForce 940MX]
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 133
    Region 0: Memory at ee000000 (32-bit, non-prefetchable) [size=16M]
    Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
    Region 3: Memory at e0000000 (64-bit, prefetchable) [size=32M]
    Region 5: I/O ports at e000 [size=128]
    Expansion ROM at ef000000 [disabled] [size=512K]

Region 0, 1, 3 and the expansion ROM are all included in the mtrr region reg00.


The magic register that we write to workaround the issue is in PCI
bridge config space - not in a BAR.

Thanks
Daniel
Mika Westerberg Sept. 4, 2018, 9:36 a.m. UTC | #8
On Tue, Sep 04, 2018 at 03:07:52PM +0800, Daniel Drake wrote:
> On Tue, Sep 4, 2018 at 2:43 PM, Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> > Yes, can you check if the failing device BAR is included in any of the
> > above entries? If not then it is probably not related.
> 
> mtrr again for reference:
> reg00: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable
> reg01: base=0x0a0000000 ( 2560MB), size=  512MB, count=1: uncachable
> reg02: base=0x090000000 ( 2304MB), size=  256MB, count=1: uncachable
> reg03: base=0x08c000000 ( 2240MB), size=   64MB, count=1: uncachable
> reg04: base=0x08b800000 ( 2232MB), size=    8MB, count=1: uncachable
> 
> 
> The PCI bridge is:
> 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express
> Root Port (rev f1) (prog-if 00 [Normal decode])
>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>     Latency: 0, Cache Line Size: 64 bytes
>     Interrupt: pin A routed to IRQ 122
>     Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>     I/O behind bridge: 0000e000-0000efff
>     Memory behind bridge: ee000000-ef0fffff
>     Prefetchable memory behind bridge: 00000000d0000000-00000000e1ffffff
> 
> The memory behind bridge at ee000000 is included in the mtrr region
> reg00 which is 0xc0000000 to 0xffffffff.
> Same for the prefetchable memory behind bridge.

Yeah and it is uncachable so it should be fine.

> The nvidia GPU which becomes unresponsive is:
> 
> 01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)
>     Subsystem: ASUSTeK Computer Inc. GM108M [GeForce 940MX]
>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>     Latency: 0, Cache Line Size: 64 bytes
>     Interrupt: pin A routed to IRQ 133
>     Region 0: Memory at ee000000 (32-bit, non-prefetchable) [size=16M]
>     Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
>     Region 3: Memory at e0000000 (64-bit, prefetchable) [size=32M]
>     Region 5: I/O ports at e000 [size=128]
>     Expansion ROM at ef000000 [disabled] [size=512K]
> 
> Region 0, 1, 3 and the expansion ROM are all included in the mtrr region reg00.
> 
> 
> The magic register that we write to workaround the issue is in PCI
> bridge config space - not in a BAR.

OK, I just wanted to rule out MTRR misconfiguration but I guess it is
not the case here.
kernel test robot Sept. 4, 2018, 3:31 p.m. UTC | #9
Hi Daniel,

I love your patch! Perhaps something to improve:

[auto build test WARNING on pci/next]
[also build test WARNING on v4.19-rc2 next-20180831]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Daniel-Drake/PCI-add-prefetch-quirk-to-work-around-Asus-Nvidia-suspend-issues/20180901-043245
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: x86_64-randconfig-s5-09031857 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 
:::::: branch date: 3 days ago
:::::: commit date: 3 days ago

All warnings (new ones prefixed by >>):

   In file included from include/linux/export.h:45:0,
                    from include/linux/linkage.h:7,
                    from include/linux/kernel.h:7,
                    from drivers/pci/quirks.c:16:
   drivers/pci/quirks.c: In function 'quirk_asus_pci_prefetch':
   drivers/pci/quirks.c:5134:6: warning: argument 1 null where non-null expected [-Wnonnull]
     if (strcmp(sys_vendor, "ASUSTeK COMPUTER INC.") != 0)
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^~~~
>> drivers/pci/quirks.c:5134:2: note: in expansion of macro 'if'
     if (strcmp(sys_vendor, "ASUSTeK COMPUTER INC.") != 0)
     ^~
   In file included from include/linux/uuid.h:20:0,
                    from include/linux/mod_devicetable.h:13,
                    from include/linux/pci.h:21,
                    from drivers/pci/quirks.c:18:
   include/linux/string.h:44:12: note: in a call to function 'strcmp' declared here
    extern int strcmp(const char *,const char *);
               ^~~~~~

# https://github.com/0day-ci/linux/commit/eccd2a8c40e1a705a666e6fe1c52aca3f2130984
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout eccd2a8c40e1a705a666e6fe1c52aca3f2130984
vim +/if +5134 drivers/pci/quirks.c

e7aaf90f9 Bjorn Helgaas 2018-08-15  4983  
e7aaf90f9 Bjorn Helgaas 2018-08-15  4984  /*
ad281ecf1 Doug Meyer    2018-05-23  4985   * Microsemi Switchtec NTB uses devfn proxy IDs to move TLPs between
ad281ecf1 Doug Meyer    2018-05-23  4986   * NT endpoints via the internal switch fabric. These IDs replace the
ad281ecf1 Doug Meyer    2018-05-23  4987   * originating requestor ID TLPs which access host memory on peer NTB
ad281ecf1 Doug Meyer    2018-05-23  4988   * ports. Therefore, all proxy IDs must be aliased to the NTB device
ad281ecf1 Doug Meyer    2018-05-23  4989   * to permit access when the IOMMU is turned on.
ad281ecf1 Doug Meyer    2018-05-23  4990   */
ad281ecf1 Doug Meyer    2018-05-23  4991  static void quirk_switchtec_ntb_dma_alias(struct pci_dev *pdev)
ad281ecf1 Doug Meyer    2018-05-23  4992  {
ad281ecf1 Doug Meyer    2018-05-23  4993  	void __iomem *mmio;
ad281ecf1 Doug Meyer    2018-05-23  4994  	struct ntb_info_regs __iomem *mmio_ntb;
ad281ecf1 Doug Meyer    2018-05-23  4995  	struct ntb_ctrl_regs __iomem *mmio_ctrl;
ad281ecf1 Doug Meyer    2018-05-23  4996  	struct sys_info_regs __iomem *mmio_sys_info;
ad281ecf1 Doug Meyer    2018-05-23  4997  	u64 partition_map;
ad281ecf1 Doug Meyer    2018-05-23  4998  	u8 partition;
ad281ecf1 Doug Meyer    2018-05-23  4999  	int pp;
ad281ecf1 Doug Meyer    2018-05-23  5000  
ad281ecf1 Doug Meyer    2018-05-23  5001  	if (pci_enable_device(pdev)) {
ad281ecf1 Doug Meyer    2018-05-23  5002  		pci_err(pdev, "Cannot enable Switchtec device\n");
ad281ecf1 Doug Meyer    2018-05-23  5003  		return;
ad281ecf1 Doug Meyer    2018-05-23  5004  	}
ad281ecf1 Doug Meyer    2018-05-23  5005  
ad281ecf1 Doug Meyer    2018-05-23  5006  	mmio = pci_iomap(pdev, 0, 0);
ad281ecf1 Doug Meyer    2018-05-23  5007  	if (mmio == NULL) {
ad281ecf1 Doug Meyer    2018-05-23  5008  		pci_disable_device(pdev);
ad281ecf1 Doug Meyer    2018-05-23  5009  		pci_err(pdev, "Cannot iomap Switchtec device\n");
ad281ecf1 Doug Meyer    2018-05-23  5010  		return;
ad281ecf1 Doug Meyer    2018-05-23  5011  	}
ad281ecf1 Doug Meyer    2018-05-23  5012  
ad281ecf1 Doug Meyer    2018-05-23  5013  	pci_info(pdev, "Setting Switchtec proxy ID aliases\n");
ad281ecf1 Doug Meyer    2018-05-23  5014  
ad281ecf1 Doug Meyer    2018-05-23  5015  	mmio_ntb = mmio + SWITCHTEC_GAS_NTB_OFFSET;
ad281ecf1 Doug Meyer    2018-05-23  5016  	mmio_ctrl = (void __iomem *) mmio_ntb + SWITCHTEC_NTB_REG_CTRL_OFFSET;
ad281ecf1 Doug Meyer    2018-05-23  5017  	mmio_sys_info = mmio + SWITCHTEC_GAS_SYS_INFO_OFFSET;
ad281ecf1 Doug Meyer    2018-05-23  5018  
ad281ecf1 Doug Meyer    2018-05-23  5019  	partition = ioread8(&mmio_ntb->partition_id);
ad281ecf1 Doug Meyer    2018-05-23  5020  
ad281ecf1 Doug Meyer    2018-05-23  5021  	partition_map = ioread32(&mmio_ntb->ep_map);
ad281ecf1 Doug Meyer    2018-05-23  5022  	partition_map |= ((u64) ioread32(&mmio_ntb->ep_map + 4)) << 32;
ad281ecf1 Doug Meyer    2018-05-23  5023  	partition_map &= ~(1ULL << partition);
ad281ecf1 Doug Meyer    2018-05-23  5024  
ad281ecf1 Doug Meyer    2018-05-23  5025  	for (pp = 0; pp < (sizeof(partition_map) * 8); pp++) {
ad281ecf1 Doug Meyer    2018-05-23  5026  		struct ntb_ctrl_regs __iomem *mmio_peer_ctrl;
ad281ecf1 Doug Meyer    2018-05-23  5027  		u32 table_sz = 0;
ad281ecf1 Doug Meyer    2018-05-23  5028  		int te;
ad281ecf1 Doug Meyer    2018-05-23  5029  
ad281ecf1 Doug Meyer    2018-05-23  5030  		if (!(partition_map & (1ULL << pp)))
ad281ecf1 Doug Meyer    2018-05-23  5031  			continue;
ad281ecf1 Doug Meyer    2018-05-23  5032  
ad281ecf1 Doug Meyer    2018-05-23  5033  		pci_dbg(pdev, "Processing partition %d\n", pp);
ad281ecf1 Doug Meyer    2018-05-23  5034  
ad281ecf1 Doug Meyer    2018-05-23  5035  		mmio_peer_ctrl = &mmio_ctrl[pp];
ad281ecf1 Doug Meyer    2018-05-23  5036  
ad281ecf1 Doug Meyer    2018-05-23  5037  		table_sz = ioread16(&mmio_peer_ctrl->req_id_table_size);
ad281ecf1 Doug Meyer    2018-05-23  5038  		if (!table_sz) {
ad281ecf1 Doug Meyer    2018-05-23  5039  			pci_warn(pdev, "Partition %d table_sz 0\n", pp);
ad281ecf1 Doug Meyer    2018-05-23  5040  			continue;
ad281ecf1 Doug Meyer    2018-05-23  5041  		}
ad281ecf1 Doug Meyer    2018-05-23  5042  
ad281ecf1 Doug Meyer    2018-05-23  5043  		if (table_sz > 512) {
ad281ecf1 Doug Meyer    2018-05-23  5044  			pci_warn(pdev,
ad281ecf1 Doug Meyer    2018-05-23  5045  				 "Invalid Switchtec partition %d table_sz %d\n",
ad281ecf1 Doug Meyer    2018-05-23  5046  				 pp, table_sz);
ad281ecf1 Doug Meyer    2018-05-23  5047  			continue;
ad281ecf1 Doug Meyer    2018-05-23  5048  		}
ad281ecf1 Doug Meyer    2018-05-23  5049  
ad281ecf1 Doug Meyer    2018-05-23  5050  		for (te = 0; te < table_sz; te++) {
ad281ecf1 Doug Meyer    2018-05-23  5051  			u32 rid_entry;
ad281ecf1 Doug Meyer    2018-05-23  5052  			u8 devfn;
ad281ecf1 Doug Meyer    2018-05-23  5053  
ad281ecf1 Doug Meyer    2018-05-23  5054  			rid_entry = ioread32(&mmio_peer_ctrl->req_id_table[te]);
ad281ecf1 Doug Meyer    2018-05-23  5055  			devfn = (rid_entry >> 1) & 0xFF;
ad281ecf1 Doug Meyer    2018-05-23  5056  			pci_dbg(pdev,
ad281ecf1 Doug Meyer    2018-05-23  5057  				"Aliasing Partition %d Proxy ID %02x.%d\n",
ad281ecf1 Doug Meyer    2018-05-23  5058  				pp, PCI_SLOT(devfn), PCI_FUNC(devfn));
ad281ecf1 Doug Meyer    2018-05-23  5059  			pci_add_dma_alias(pdev, devfn);
ad281ecf1 Doug Meyer    2018-05-23  5060  		}
ad281ecf1 Doug Meyer    2018-05-23  5061  	}
ad281ecf1 Doug Meyer    2018-05-23  5062  
ad281ecf1 Doug Meyer    2018-05-23  5063  	pci_iounmap(pdev, mmio);
ad281ecf1 Doug Meyer    2018-05-23  5064  	pci_disable_device(pdev);
ad281ecf1 Doug Meyer    2018-05-23  5065  }
ad281ecf1 Doug Meyer    2018-05-23  5066  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8531,
ad281ecf1 Doug Meyer    2018-05-23  5067  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5068  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8532,
ad281ecf1 Doug Meyer    2018-05-23  5069  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5070  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8533,
ad281ecf1 Doug Meyer    2018-05-23  5071  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5072  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8534,
ad281ecf1 Doug Meyer    2018-05-23  5073  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5074  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8535,
ad281ecf1 Doug Meyer    2018-05-23  5075  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5076  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8536,
ad281ecf1 Doug Meyer    2018-05-23  5077  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5078  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8543,
ad281ecf1 Doug Meyer    2018-05-23  5079  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5080  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8544,
ad281ecf1 Doug Meyer    2018-05-23  5081  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5082  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8545,
ad281ecf1 Doug Meyer    2018-05-23  5083  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5084  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8546,
ad281ecf1 Doug Meyer    2018-05-23  5085  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5086  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8551,
ad281ecf1 Doug Meyer    2018-05-23  5087  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5088  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8552,
ad281ecf1 Doug Meyer    2018-05-23  5089  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5090  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8553,
ad281ecf1 Doug Meyer    2018-05-23  5091  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5092  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8554,
ad281ecf1 Doug Meyer    2018-05-23  5093  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5094  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8555,
ad281ecf1 Doug Meyer    2018-05-23  5095  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5096  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8556,
ad281ecf1 Doug Meyer    2018-05-23  5097  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5098  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8561,
ad281ecf1 Doug Meyer    2018-05-23  5099  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5100  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8562,
ad281ecf1 Doug Meyer    2018-05-23  5101  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5102  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8563,
ad281ecf1 Doug Meyer    2018-05-23  5103  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5104  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8564,
ad281ecf1 Doug Meyer    2018-05-23  5105  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5106  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8565,
ad281ecf1 Doug Meyer    2018-05-23  5107  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5108  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8566,
ad281ecf1 Doug Meyer    2018-05-23  5109  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5110  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8571,
ad281ecf1 Doug Meyer    2018-05-23  5111  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5112  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8572,
ad281ecf1 Doug Meyer    2018-05-23  5113  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5114  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8573,
ad281ecf1 Doug Meyer    2018-05-23  5115  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5116  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8574,
ad281ecf1 Doug Meyer    2018-05-23  5117  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5118  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8575,
ad281ecf1 Doug Meyer    2018-05-23  5119  			quirk_switchtec_ntb_dma_alias);
ad281ecf1 Doug Meyer    2018-05-23  5120  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8576,
ad281ecf1 Doug Meyer    2018-05-23  5121  			quirk_switchtec_ntb_dma_alias);
eccd2a8c4 Daniel Drake  2018-08-31  5122  
eccd2a8c4 Daniel Drake  2018-08-31  5123  /*
eccd2a8c4 Daniel Drake  2018-08-31  5124   * The Nvidia GPU on many Intel-based Asus products is unusable after
eccd2a8c4 Daniel Drake  2018-08-31  5125   * S3 resume. However, for unknown reasons, rewriting the value of register
eccd2a8c4 Daniel Drake  2018-08-31  5126   * 'Prefetchable Base Upper 32 Bits' on the parent PCI bridge works around
eccd2a8c4 Daniel Drake  2018-08-31  5127   * the issue.
eccd2a8c4 Daniel Drake  2018-08-31  5128   */
eccd2a8c4 Daniel Drake  2018-08-31  5129  static void quirk_asus_pci_prefetch(struct pci_dev *bridge)
eccd2a8c4 Daniel Drake  2018-08-31  5130  {
eccd2a8c4 Daniel Drake  2018-08-31  5131  	const char *sys_vendor = dmi_get_system_info(DMI_SYS_VENDOR);
eccd2a8c4 Daniel Drake  2018-08-31  5132  	u32 value;
eccd2a8c4 Daniel Drake  2018-08-31  5133  
eccd2a8c4 Daniel Drake  2018-08-31 @5134  	if (strcmp(sys_vendor, "ASUSTeK COMPUTER INC.") != 0)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Daniel Drake Sept. 6, 2018, 9:02 a.m. UTC | #10
On Sat, Sep 1, 2018 at 3:12 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> Can we tell whether Windows rewrites this register unconditionally at
> resume-time?  If so, it may be more robust for Linux to do the same.
> The whole thing is black magic, which I hate, but if it's our only
> choice, it may be better to have this applied everywhere so we don't
> keep stubbing our toes on new systems that require the quirk.

Checked this with qemu adding a PCI-to-PCI bridge (ioh3420).

$ qemu-system-x86_64 -enable-kvm -M q35,accel=kvm -m 2G -vga qxl -cpu
host -hda testimg.img -device
ioh3420,id=rp1,bus=pcie.0,addr=1c.0,port=1 -trace events=events.txt

events.txt has:
pci_cfg_read
pci_cfg_write

Logged cfg space accesses during boot:
https://gist.github.com/dsd/135fb255cb2b237567d8ea2d6bfc6917#file-boot-txt

Suspend:
https://gist.github.com/dsd/135fb255cb2b237567d8ea2d6bfc6917#file-suspend-txt

Resume:
https://gist.github.com/dsd/135fb255cb2b237567d8ea2d6bfc6917#file-resume-txt

Notably during resume, the prefetch-related registers get rewritten:
  pci_cfg_write ioh3420 28:0 @0x24 <- 0xfeb0fea0
  pci_cfg_write ioh3420 28:0 @0x28 <- 0x0
  pci_cfg_write ioh3420 28:0 @0x2c <- 0x0

This happened even though there was nothing behind the bridge.
Windows failed to resume in this test (black screen) but the traced
register writes seem indicative enough.

Peter Wu confirms the same results in a similar experiment:
https://marc.info/?l=linux-pci&m=153616336225386&w=2

I'll look into creating a new patch that unconditionally reprograms
the PCI bridge prefetch stuff on resume.

Thanks
Daniel
Thomas Martitz Sept. 6, 2018, 1:35 p.m. UTC | #11
Am 31.08.2018 um 09:30 schrieb Daniel Drake:
> On over 40 Intel-based Asus products, the nvidia GPU becomes unusable
> after S3 suspend/resume. The affected products include multiple
> generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> many errors such as:
>
>      fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04 [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
>      DRM: failed to idle channel 0 [DRM]
>
> Similarly, the nvidia proprietary driver also fails after resume
> (black screen, 100% CPU usage in Xorg process). We shipped a sample
> to Nvidia for diagnosis, and their response indicated that it's a
> problem with the parent PCI bridge (on the Intel SoC), not the GPU.
>
> We found a workaround: on resume, rewrite the Intel PCI bridge
> 'Prefetchable Base Upper 32 Bits' register. In the cases that I checked,
> this register has value 0 and we just have to rewrite that value.
>
> It's very strange that rewriting the exact same register value
> makes a difference, but it definitely makes the issue go away.
> It's not just acting as some kind of memory barrier, because rewriting
> other bridge registers does not work around the issue. There's something
> magic in this particular register.
>
> We examined our database of Asus hardware and identified 43 products
> that we believe are affected. Checking the nvidia GPU parent PCI bridge
> on each one, in total 5 Intel PCI bridges need quirking as below.
> The quirk will run on bridges even where no nvidia GPU is connected,
> but it should be harmless, and we at least limit it to only running
> on Asus products.
>
> This fix was tested on all the affected models that we have in hands
> (X542UQ, UX533FD, X530UN, V272UN).

Hello,

this patch helps on my HP Zbook 14u G5 which otherwise fails to resume 
the dGPU after suspend. In this case it's a radeon gpu (polaris 10). Of 
course I had to remove the check for ASUS, but made no other changes.

With this patch I can successfully run "DRI_PRIME=1 glxinfo | grep -i 
renderer" and see the radeon, as well as "DRI_PRIME=1 glxgears", after 
resuming from suspend. Attemting that without the patch makes the system 
hang for a few seconds followed by lots of powerplay errors in dmesg. 
glxinfo/gears sometimes use the Intel graphics or show a blank window.

FWIW, this problem was discussed a lot in bug 
https://bugs.freedesktop.org/show_bug.cgi?id=105760 (it's closed only 
because the original bug crash is solved but the root problem is still 
unfixed). Therefore I add Peter Wu and Alex Deucher who attempted to 
help me out already.

I think this supports your other mail where you suggest it should be 
done unconditionally.

Thanks for the patch!

Best regards
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index ef7143a274e0..e0d956ee459c 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5119,3 +5119,26 @@  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8575,
 			quirk_switchtec_ntb_dma_alias);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8576,
 			quirk_switchtec_ntb_dma_alias);
+
+/*
+ * The Nvidia GPU on many Intel-based Asus products is unusable after
+ * S3 resume. However, for unknown reasons, rewriting the value of register
+ * 'Prefetchable Base Upper 32 Bits' on the parent PCI bridge works around
+ * the issue.
+ */
+static void quirk_asus_pci_prefetch(struct pci_dev *bridge)
+{
+	const char *sys_vendor = dmi_get_system_info(DMI_SYS_VENDOR);
+	u32 value;
+
+	if (strcmp(sys_vendor, "ASUSTeK COMPUTER INC.") != 0)
+		return;
+
+	pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &value);
+	pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, value);
+}
+DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x1901, quirk_asus_pci_prefetch);
+DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x31d8, quirk_asus_pci_prefetch);
+DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad8, quirk_asus_pci_prefetch);
+DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_asus_pci_prefetch);
+DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9dbc, quirk_asus_pci_prefetch);