diff mbox series

[09/13] PCI: mediatek: Advertise lack of MSI handling

Message ID 20210225151023.3642391-10-maz@kernel.org (mailing list archive)
State Awaiting Upstream
Delegated to: Geert Uytterhoeven
Headers show
Series PCI: MSI: Getting rid of msi_controller, and other cleanups | expand

Commit Message

Marc Zyngier Feb. 25, 2021, 3:10 p.m. UTC
From: Thomas Gleixner <tglx@linutronix.de>

Some Mediatek host bridges cannot handle MSIs, which is sad.
This also results in an ugly warning at device probe time,
as the core PCI code wasn't told that MSIs were not available.

Advertise this fact to the rest of the core PCI code by
using the 'no_msi' attribute.

Reported-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[maz: commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 drivers/pci/controller/pcie-mediatek.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Frank Wunderlich March 1, 2021, 10:43 a.m. UTC | #1
tested full series on bananapi-r2 and r64

r2 (with mt7615) looks good.

on r64 (with atheros card WLE900VX) i see this while loading ath10k driver:

[    6.525981] ath10k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    6.537810] ath10k_pci 0000:01:00.0: enabling bus mastering
[    6.543831] Unable to handle kernel paging request at virtual address ffffff4
013be2a80
[    6.551890] Mem abort info:
[    6.554744]   ESR = 0x96000044
[    6.557870]   EC = 0x25: DABT (current EL), IL = 32 bits
[    6.563267]   SET = 0, FnV = 0
[    6.566396]   EA = 0, S1PTW = 0
[    6.569611] Data abort info:
[    6.572501]   ISV = 0, ISS = 0x00000044
[    6.576411]   CM = 0, WnR = 1
[    6.579450] [ffffff4013be2a80] address between user and kernel address ranges
[    6.586659] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[    6.592248] Modules linked in: ath10k_pci(+) ath10k_core ath mac80211 libarc4
 btmtkuart cfg80211 bluetooth ecdh_generic ecc rfkill libaes ip_tables x_tables
[    6.606329] CPU: 1 PID: 114 Comm: systemd-udevd Not tainted 5.11.0-bpi-r64-pc
i #3
[    6.613819] Hardware name: Bananapi BPI-R64 (DT)
[    6.618439] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[    6.624452] pc : queued_spin_lock_slowpath+0x1e8/0x31c
[    6.629608] lr : queued_spin_lock_slowpath+0xac/0x31c
[    6.634666] sp : ffffffc010f63550
[    6.637982] x29: ffffffc010f63550 x28: 000000000000fc7e
[    6.643306] x27: ffffffc010c67410 x26: 0000000000080000
[    6.648629] x25: ffffffc010c67880 x24: ffffffc010f63810
[    6.653950] x23: 0000000000000000 x22: ffffffc010ba8860
[    6.659270] x21: ffffff803fdcc540 x20: ffffffc010a1c540
[    6.664591] x19: ffffff80016a1708 x18: 0000000000000000
[    6.669914] x17: 0000000000000000 x16: 0000000000000000
[    6.675236] x15: 000000000000000a x14: 0000000000000092
[    6.680560] x13: ffffff8006671004 x12: 0000000000000000
[    6.685883] x11: 0101010101010101 x10: ffffff8001635568
[    6.691206] x9 : 0000000000080000 x8 : ffffff8001635560
[    6.696529] x7 : 0000000000000000 x6 : ffffff803fdcc540
[    6.701849] x5 : 0000000000000002 x4 : 0000000000080000
[    6.707170] x3 : ffffff80016a170a x2 : 000000000000016a
[    6.712493] x1 : ffffff80031c6520 x0 : ffffffc010a1c560
[    6.717818] Call trace:
[    6.720276]  queued_spin_lock_slowpath+0x1e8/0x31c
[    6.725086]  do_raw_spin_lock+0x2c/0x38
[    6.728931]  _raw_spin_lock+0x24/0x34
[    6.732606]  __mutex_lock.isra.0+0xc4/0x29c
[    6.736799]  __mutex_lock_slowpath+0x14/0x20
[    6.741078]  mutex_lock+0x28/0x34
[    6.744402]  mtk_pcie_irq_domain_alloc+0x3c/0xd0
[    6.749037]  irq_domain_alloc_irqs_hierarchy+0x50/0x54
[    6.754187]  irq_domain_alloc_irqs_parent+0x18/0x2c
[    6.759073]  msi_domain_alloc+0x8c/0x12c
[    6.763007]  irq_domain_alloc_irqs_hierarchy+0x50/0x54
[    6.768154]  __irq_domain_alloc_irqs+0x114/0x344
[    6.772780]  __msi_domain_alloc_irqs+0x110/0x318
[    6.777408]  msi_domain_alloc_irqs+0x1c/0x28
[    6.781685]  pci_msi_setup_msi_irqs.isra.0+0x2c/0x44
[    6.786662]  __pci_enable_msi_range+0x230/0x320
[    6.791202]  pci_enable_msi+0x1c/0x30
[    6.794874]  ath10k_pci_probe+0x480/0x748 [ath10k_pci]
[    6.800058]  pci_device_probe+0xbc/0x14c
[    6.804014]  really_probe+0x2a0/0x470
[    6.807701]  driver_probe_device+0x12c/0x13c
[    6.811981]  device_driver_attach+0x44/0x70
[    6.816181]  __driver_attach+0x13c/0x140
[    6.820126]  bus_for_each_dev+0x70/0xc0
[    6.823971]  driver_attach+0x24/0x30
[    6.827556]  bus_add_driver+0x1a4/0x1ec
[    6.831401]  driver_register+0xb4/0xec
[    6.835168]  __pci_register_driver+0x44/0x50
[    6.839465]  ath10k_pci_init+0x28/0x1000 [ath10k_pci]
[    6.844563]  do_one_initcall+0x6c/0x188
[    6.848431]  do_init_module+0x5c/0x1e8
[    6.852205]  load_module+0x1124/0x11c8
[    6.855967]  __do_sys_finit_module+0xdc/0x100
[    6.860335]  __arm64_sys_finit_module+0x1c/0x28
[    6.864877]  el0_svc_common.constprop.0+0x124/0x198
[    6.869766]  do_el0_svc+0x48/0x78
[    6.873089]  el0_svc+0x14/0x20
[    6.876158]  el0_sync_handler+0xcc/0x154
[    6.880091]  el0_sync+0x174/0x180
[    6.883425] Code: d37c0400 51000421 8b000280 f861dac1 (f8216806)
[    6.889525] ---[ end trace 62498e1f489ea3ab ]---

i guess it's a bug in ath10k driver or my r64 board (it is a v1.1 which has missing capacitors on tx lines).
Tried with an mt7612e, this seems to work without any errors.

so for mt7622/mt7623

Tested-by: Frank Wunderlich <frank-w@public-files.de>

regards Frank
Marc Zyngier March 1, 2021, 11:49 a.m. UTC | #2
Frank,

On 2021-03-01 10:43, Frank Wunderlich wrote:
> tested full series on bananapi-r2 and r64
> 
> r2 (with mt7615) looks good.
> 
> on r64 (with atheros card WLE900VX) i see this while loading ath10k 
> driver:
> 
> [    6.525981] ath10k_pci 0000:01:00.0: enabling device (0000 -> 0002)
> [    6.537810] ath10k_pci 0000:01:00.0: enabling bus mastering
> [    6.543831] Unable to handle kernel paging request at virtual 
> address ffffff4
> 013be2a80
> [    6.551890] Mem abort info:
> [    6.554744]   ESR = 0x96000044
> [    6.557870]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    6.563267]   SET = 0, FnV = 0
> [    6.566396]   EA = 0, S1PTW = 0
> [    6.569611] Data abort info:
> [    6.572501]   ISV = 0, ISS = 0x00000044
> [    6.576411]   CM = 0, WnR = 1
> [    6.579450] [ffffff4013be2a80] address between user and kernel 
> address ranges
> [    6.586659] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [    6.592248] Modules linked in: ath10k_pci(+) ath10k_core ath 
> mac80211 libarc4
>  btmtkuart cfg80211 bluetooth ecdh_generic ecc rfkill libaes ip_tables 
> x_tables
> [    6.606329] CPU: 1 PID: 114 Comm: systemd-udevd Not tainted 
> 5.11.0-bpi-r64-pc
> i #3
> [    6.613819] Hardware name: Bananapi BPI-R64 (DT)
> [    6.618439] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> [    6.624452] pc : queued_spin_lock_slowpath+0x1e8/0x31c
> [    6.629608] lr : queued_spin_lock_slowpath+0xac/0x31c
> [    6.634666] sp : ffffffc010f63550
> [    6.637982] x29: ffffffc010f63550 x28: 000000000000fc7e
> [    6.643306] x27: ffffffc010c67410 x26: 0000000000080000
> [    6.648629] x25: ffffffc010c67880 x24: ffffffc010f63810
> [    6.653950] x23: 0000000000000000 x22: ffffffc010ba8860
> [    6.659270] x21: ffffff803fdcc540 x20: ffffffc010a1c540
> [    6.664591] x19: ffffff80016a1708 x18: 0000000000000000
> [    6.669914] x17: 0000000000000000 x16: 0000000000000000
> [    6.675236] x15: 000000000000000a x14: 0000000000000092
> [    6.680560] x13: ffffff8006671004 x12: 0000000000000000
> [    6.685883] x11: 0101010101010101 x10: ffffff8001635568
> [    6.691206] x9 : 0000000000080000 x8 : ffffff8001635560
> [    6.696529] x7 : 0000000000000000 x6 : ffffff803fdcc540
> [    6.701849] x5 : 0000000000000002 x4 : 0000000000080000
> [    6.707170] x3 : ffffff80016a170a x2 : 000000000000016a
> [    6.712493] x1 : ffffff80031c6520 x0 : ffffffc010a1c560
> [    6.717818] Call trace:
> [    6.720276]  queued_spin_lock_slowpath+0x1e8/0x31c
> [    6.725086]  do_raw_spin_lock+0x2c/0x38
> [    6.728931]  _raw_spin_lock+0x24/0x34
> [    6.732606]  __mutex_lock.isra.0+0xc4/0x29c
> [    6.736799]  __mutex_lock_slowpath+0x14/0x20
> [    6.741078]  mutex_lock+0x28/0x34
> [    6.744402]  mtk_pcie_irq_domain_alloc+0x3c/0xd0
> [    6.749037]  irq_domain_alloc_irqs_hierarchy+0x50/0x54
> [    6.754187]  irq_domain_alloc_irqs_parent+0x18/0x2c
> [    6.759073]  msi_domain_alloc+0x8c/0x12c
> [    6.763007]  irq_domain_alloc_irqs_hierarchy+0x50/0x54
> [    6.768154]  __irq_domain_alloc_irqs+0x114/0x344
> [    6.772780]  __msi_domain_alloc_irqs+0x110/0x318
> [    6.777408]  msi_domain_alloc_irqs+0x1c/0x28
> [    6.781685]  pci_msi_setup_msi_irqs.isra.0+0x2c/0x44
> [    6.786662]  __pci_enable_msi_range+0x230/0x320
> [    6.791202]  pci_enable_msi+0x1c/0x30
> [    6.794874]  ath10k_pci_probe+0x480/0x748 [ath10k_pci]
> [    6.800058]  pci_device_probe+0xbc/0x14c
> [    6.804014]  really_probe+0x2a0/0x470
> [    6.807701]  driver_probe_device+0x12c/0x13c
> [    6.811981]  device_driver_attach+0x44/0x70
> [    6.816181]  __driver_attach+0x13c/0x140
> [    6.820126]  bus_for_each_dev+0x70/0xc0
> [    6.823971]  driver_attach+0x24/0x30
> [    6.827556]  bus_add_driver+0x1a4/0x1ec
> [    6.831401]  driver_register+0xb4/0xec
> [    6.835168]  __pci_register_driver+0x44/0x50
> [    6.839465]  ath10k_pci_init+0x28/0x1000 [ath10k_pci]
> [    6.844563]  do_one_initcall+0x6c/0x188
> [    6.848431]  do_init_module+0x5c/0x1e8
> [    6.852205]  load_module+0x1124/0x11c8
> [    6.855967]  __do_sys_finit_module+0xdc/0x100
> [    6.860335]  __arm64_sys_finit_module+0x1c/0x28
> [    6.864877]  el0_svc_common.constprop.0+0x124/0x198
> [    6.869766]  do_el0_svc+0x48/0x78
> [    6.873089]  el0_svc+0x14/0x20
> [    6.876158]  el0_sync_handler+0xcc/0x154
> [    6.880091]  el0_sync+0x174/0x180
> [    6.883425] Code: d37c0400 51000421 8b000280 f861dac1 (f8216806)
> [    6.889525] ---[ end trace 62498e1f489ea3ab ]---
> 
> i guess it's a bug in ath10k driver or my r64 board (it is a v1.1
> which has missing capacitors on tx lines).

No, this definitely looks like a bug in the MTK PCIe driver,
where the mutex is either not properly initialised, corrupted,
or the wrong pointer is passed.

This r64 machine is supposed to have working MSIs, right?
Do you get the same issue without this series?

> Tried with an mt7612e, this seems to work without any errors.
> 
> so for mt7622/mt7623
> 
> Tested-by: Frank Wunderlich <frank-w@public-files.de>

We definitely need to understand the above.

Thanks,

         M.
Frank Wunderlich March 1, 2021, 12:16 p.m. UTC | #3
regards Frank


> Gesendet: Montag, 01. März 2021 um 12:49 Uhr
> Von: "Marc Zyngier" <maz@kernel.org>
> Frank,
> 
> On 2021-03-01 10:43, Frank Wunderlich wrote:
> > tested full series on bananapi-r2 and r64
> > 
> > r2 (with mt7615) looks good.
> > 
> > on r64 (with atheros card WLE900VX) i see this while loading ath10k 
> > driver:
> > 
> > [    6.525981] ath10k_pci 0000:01:00.0: enabling device (0000 -> 0002)
> > [    6.537810] ath10k_pci 0000:01:00.0: enabling bus mastering
> > [    6.543831] Unable to handle kernel paging request at virtual 
> > address ffffff4
> > 013be2a80
> > [    6.551890] Mem abort info:
> > [    6.554744]   ESR = 0x96000044
> > [    6.557870]   EC = 0x25: DABT (current EL), IL = 32 bits
> > [    6.563267]   SET = 0, FnV = 0
> > [    6.566396]   EA = 0, S1PTW = 0
> > [    6.569611] Data abort info:
> > [    6.572501]   ISV = 0, ISS = 0x00000044
> > [    6.576411]   CM = 0, WnR = 1
> > [    6.579450] [ffffff4013be2a80] address between user and kernel 
> > address ranges
> > [    6.586659] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> > [    6.592248] Modules linked in: ath10k_pci(+) ath10k_core ath 
> > mac80211 libarc4
> >  btmtkuart cfg80211 bluetooth ecdh_generic ecc rfkill libaes ip_tables 
> > x_tables
> > [    6.606329] CPU: 1 PID: 114 Comm: systemd-udevd Not tainted 
> > 5.11.0-bpi-r64-pc
> > i #3
> > [    6.613819] Hardware name: Bananapi BPI-R64 (DT)
> > [    6.618439] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> > [    6.624452] pc : queued_spin_lock_slowpath+0x1e8/0x31c
> > [    6.629608] lr : queued_spin_lock_slowpath+0xac/0x31c
> > [    6.634666] sp : ffffffc010f63550
> > [    6.637982] x29: ffffffc010f63550 x28: 000000000000fc7e
> > [    6.643306] x27: ffffffc010c67410 x26: 0000000000080000
> > [    6.648629] x25: ffffffc010c67880 x24: ffffffc010f63810
> > [    6.653950] x23: 0000000000000000 x22: ffffffc010ba8860
> > [    6.659270] x21: ffffff803fdcc540 x20: ffffffc010a1c540
> > [    6.664591] x19: ffffff80016a1708 x18: 0000000000000000
> > [    6.669914] x17: 0000000000000000 x16: 0000000000000000
> > [    6.675236] x15: 000000000000000a x14: 0000000000000092
> > [    6.680560] x13: ffffff8006671004 x12: 0000000000000000
> > [    6.685883] x11: 0101010101010101 x10: ffffff8001635568
> > [    6.691206] x9 : 0000000000080000 x8 : ffffff8001635560
> > [    6.696529] x7 : 0000000000000000 x6 : ffffff803fdcc540
> > [    6.701849] x5 : 0000000000000002 x4 : 0000000000080000
> > [    6.707170] x3 : ffffff80016a170a x2 : 000000000000016a
> > [    6.712493] x1 : ffffff80031c6520 x0 : ffffffc010a1c560
> > [    6.717818] Call trace:
> > [    6.720276]  queued_spin_lock_slowpath+0x1e8/0x31c
> > [    6.725086]  do_raw_spin_lock+0x2c/0x38
> > [    6.728931]  _raw_spin_lock+0x24/0x34
> > [    6.732606]  __mutex_lock.isra.0+0xc4/0x29c
> > [    6.736799]  __mutex_lock_slowpath+0x14/0x20
> > [    6.741078]  mutex_lock+0x28/0x34
> > [    6.744402]  mtk_pcie_irq_domain_alloc+0x3c/0xd0
> > [    6.749037]  irq_domain_alloc_irqs_hierarchy+0x50/0x54
> > [    6.754187]  irq_domain_alloc_irqs_parent+0x18/0x2c
> > [    6.759073]  msi_domain_alloc+0x8c/0x12c
> > [    6.763007]  irq_domain_alloc_irqs_hierarchy+0x50/0x54
> > [    6.768154]  __irq_domain_alloc_irqs+0x114/0x344
> > [    6.772780]  __msi_domain_alloc_irqs+0x110/0x318
> > [    6.777408]  msi_domain_alloc_irqs+0x1c/0x28
> > [    6.781685]  pci_msi_setup_msi_irqs.isra.0+0x2c/0x44
> > [    6.786662]  __pci_enable_msi_range+0x230/0x320
> > [    6.791202]  pci_enable_msi+0x1c/0x30
> > [    6.794874]  ath10k_pci_probe+0x480/0x748 [ath10k_pci]
> > [    6.800058]  pci_device_probe+0xbc/0x14c
> > [    6.804014]  really_probe+0x2a0/0x470
> > [    6.807701]  driver_probe_device+0x12c/0x13c
> > [    6.811981]  device_driver_attach+0x44/0x70
> > [    6.816181]  __driver_attach+0x13c/0x140
> > [    6.820126]  bus_for_each_dev+0x70/0xc0
> > [    6.823971]  driver_attach+0x24/0x30
> > [    6.827556]  bus_add_driver+0x1a4/0x1ec
> > [    6.831401]  driver_register+0xb4/0xec
> > [    6.835168]  __pci_register_driver+0x44/0x50
> > [    6.839465]  ath10k_pci_init+0x28/0x1000 [ath10k_pci]
> > [    6.844563]  do_one_initcall+0x6c/0x188
> > [    6.848431]  do_init_module+0x5c/0x1e8
> > [    6.852205]  load_module+0x1124/0x11c8
> > [    6.855967]  __do_sys_finit_module+0xdc/0x100
> > [    6.860335]  __arm64_sys_finit_module+0x1c/0x28
> > [    6.864877]  el0_svc_common.constprop.0+0x124/0x198
> > [    6.869766]  do_el0_svc+0x48/0x78
> > [    6.873089]  el0_svc+0x14/0x20
> > [    6.876158]  el0_sync_handler+0xcc/0x154
> > [    6.880091]  el0_sync+0x174/0x180
> > [    6.883425] Code: d37c0400 51000421 8b000280 f861dac1 (f8216806)
> > [    6.889525] ---[ end trace 62498e1f489ea3ab ]---
> > 
> > i guess it's a bug in ath10k driver or my r64 board (it is a v1.1
> > which has missing capacitors on tx lines).
> 
> No, this definitely looks like a bug in the MTK PCIe driver,
> where the mutex is either not properly initialised, corrupted,
> or the wrong pointer is passed.

but why does it happen only with the ath10k-card and not the mt7612 in same slot?

> This r64 machine is supposed to have working MSIs, right?

imho mt7622 have working MSI

> Do you get the same issue without this series?

tested 5.11.0 [1] without this series (but with your/thomas' patch from discussion about my old patch) and got same trace. so this series does not break anything here.

> > Tried with an mt7612e, this seems to work without any errors.
> > 
> > so for mt7622/mt7623
> > 
> > Tested-by: Frank Wunderlich <frank-w@public-files.de>
> 
> We definitely need to understand the above.

there is a hardware-bug which may cause this...afair i saw this with the card in r64 with earlier Kernel-versions where other cards work (like the mt7612e).

regards Frank

[1] https://github.com/frank-w/BPI-R2-4.14/commits/5.11-main (pci: fix MSI issue part X)
Marc Zyngier March 1, 2021, 1:31 p.m. UTC | #4
Frank,

>> > i guess it's a bug in ath10k driver or my r64 board (it is a v1.1
>> > which has missing capacitors on tx lines).
>> 
>> No, this definitely looks like a bug in the MTK PCIe driver,
>> where the mutex is either not properly initialised, corrupted,
>> or the wrong pointer is passed.
> 
> but why does it happen only with the ath10k-card and not the mt7612 in
> same slot?

Does mt7612 use MSI? What we have here is a bogus mutex in the
MTK PCIe driver, and the only way not to get there would be
to avoid using MSIs.

> 
>> This r64 machine is supposed to have working MSIs, right?
> 
> imho mt7622 have working MSI
> 
>> Do you get the same issue without this series?
> 
> tested 5.11.0 [1] without this series (but with your/thomas' patch
> from discussion about my old patch) and got same trace. so this series
> does not break anything here.

Can you retest without any additional patch on top of 5.11?
These two patches only affect platforms that do *not* have MSIs at all.

> 
>> > Tried with an mt7612e, this seems to work without any errors.
>> >
>> > so for mt7622/mt7623
>> >
>> > Tested-by: Frank Wunderlich <frank-w@public-files.de>
>> 
>> We definitely need to understand the above.
> 
> there is a hardware-bug which may cause this...afair i saw this with
> the card in r64 with earlier Kernel-versions where other cards work
> (like the mt7612e).

I don't think a HW bug affecting PCI would cause what we are seeing
here, unless it results in memory corruption.

Thanks,

         M.
Frank Wunderlich March 1, 2021, 2:06 p.m. UTC | #5
> Gesendet: Montag, 01. März 2021 um 14:31 Uhr
> Von: "Marc Zyngier" <maz@kernel.org>
>
> Frank,
> 
> >> > i guess it's a bug in ath10k driver or my r64 board (it is a v1.1
> >> > which has missing capacitors on tx lines).
> >> 
> >> No, this definitely looks like a bug in the MTK PCIe driver,
> >> where the mutex is either not properly initialised, corrupted,
> >> or the wrong pointer is passed.
> > 
> > but why does it happen only with the ath10k-card and not the mt7612 in
> > same slot?
> 
> Does mt7612 use MSI? What we have here is a bogus mutex in the
> MTK PCIe driver, and the only way not to get there would be
> to avoid using MSIs.

i guess this card/its driver does not use MSI. Did not found anything in "datasheet" [1] or driver [2] about msi

> > 
> >> This r64 machine is supposed to have working MSIs, right?
> > 
> > imho mt7622 have working MSI
> > 
> >> Do you get the same issue without this series?
> > 
> > tested 5.11.0 [1] without this series (but with your/thomas' patch
> > from discussion about my old patch) and got same trace. so this series
> > does not break anything here.
> 
> Can you retest without any additional patch on top of 5.11?
> These two patches only affect platforms that do *not* have MSIs at all.

i can revert these 2, but still need patches for mt7622 pcie-support [3]...btw. i see that i miss these in 5.11-main...do not see traceback with them (have firmware not installed...)

root@bpi-r64:~# dmesg | grep ath                                                
[    6.450765] ath10k_pci 0000:01:00.0: assign IRQ: got 146                     
[    6.661752] ath10k_pci 0000:01:00.0: enabling device (0000 -> 0002)          
[    6.697811] ath10k_pci 0000:01:00.0: enabling bus mastering                  
[    6.721293] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 r
eset_mode 0                                                                     
[    6.921030] ath10k_pci 0000:01:00.0: Failed to find firmware-N.bin (N between
 2 and 6) from ath10k/QCA988X/hw2.0: -2                                         
[    6.931698] ath10k_pci 0000:01:00.0: could not fetch firmware files (-2)     
[    6.940417] ath10k_pci 0000:01:00.0: could not probe fw (-2)

so traceback was caused by missing changes in mtk pcie-driver not yet upstream, added Chuanjia Liu

> > 
> >> > Tried with an mt7612e, this seems to work without any errors.
> >> >
> >> > so for mt7622/mt7623
> >> >
> >> > Tested-by: Frank Wunderlich <frank-w@public-files.de>
> >> 
> >> We definitely need to understand the above.
> > 
> > there is a hardware-bug which may cause this...afair i saw this with
> > the card in r64 with earlier Kernel-versions where other cards work
> > (like the mt7612e).
> 
> I don't think a HW bug affecting PCI would cause what we are seeing
> here, unless it results in memory corruption.


[1] https://www.asiarf.com/shop/wifi-wlan/wifi_mini_pcie/ws2433-wifi-11ac-mini-pcie-module-manufacturer/
[2] grep -Rni 'msi' drivers/net/wireless/mediatek/mt76/mt76x2/
[3] https://patchwork.kernel.org/project/linux-mediatek/list/?series=372885
Robin Murphy March 2, 2021, 10:35 a.m. UTC | #6
On 2021-03-01 14:06, Frank Wunderlich wrote:
>> Gesendet: Montag, 01. März 2021 um 14:31 Uhr
>> Von: "Marc Zyngier" <maz@kernel.org>
>>
>> Frank,
>>
>>>>> i guess it's a bug in ath10k driver or my r64 board (it is a v1.1
>>>>> which has missing capacitors on tx lines).
>>>>
>>>> No, this definitely looks like a bug in the MTK PCIe driver,
>>>> where the mutex is either not properly initialised, corrupted,
>>>> or the wrong pointer is passed.
>>>
>>> but why does it happen only with the ath10k-card and not the mt7612 in
>>> same slot?
>>
>> Does mt7612 use MSI? What we have here is a bogus mutex in the
>> MTK PCIe driver, and the only way not to get there would be
>> to avoid using MSIs.
> 
> i guess this card/its driver does not use MSI. Did not found anything in "datasheet" [1] or driver [2] about msi

FWIW, no need to guess - `lspci -v` (as root) should tell you whether 
the card has MSI (and/or MSI-X) capability, and whether it is enabled if so.

Robin.

>>>
>>>> This r64 machine is supposed to have working MSIs, right?
>>>
>>> imho mt7622 have working MSI
>>>
>>>> Do you get the same issue without this series?
>>>
>>> tested 5.11.0 [1] without this series (but with your/thomas' patch
>>> from discussion about my old patch) and got same trace. so this series
>>> does not break anything here.
>>
>> Can you retest without any additional patch on top of 5.11?
>> These two patches only affect platforms that do *not* have MSIs at all.
> 
> i can revert these 2, but still need patches for mt7622 pcie-support [3]...btw. i see that i miss these in 5.11-main...do not see traceback with them (have firmware not installed...)
> 
> root@bpi-r64:~# dmesg | grep ath
> [    6.450765] ath10k_pci 0000:01:00.0: assign IRQ: got 146
> [    6.661752] ath10k_pci 0000:01:00.0: enabling device (0000 -> 0002)
> [    6.697811] ath10k_pci 0000:01:00.0: enabling bus mastering
> [    6.721293] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 r
> eset_mode 0
> [    6.921030] ath10k_pci 0000:01:00.0: Failed to find firmware-N.bin (N between
>   2 and 6) from ath10k/QCA988X/hw2.0: -2
> [    6.931698] ath10k_pci 0000:01:00.0: could not fetch firmware files (-2)
> [    6.940417] ath10k_pci 0000:01:00.0: could not probe fw (-2)
> 
> so traceback was caused by missing changes in mtk pcie-driver not yet upstream, added Chuanjia Liu
> 
>>>
>>>>> Tried with an mt7612e, this seems to work without any errors.
>>>>>
>>>>> so for mt7622/mt7623
>>>>>
>>>>> Tested-by: Frank Wunderlich <frank-w@public-files.de>
>>>>
>>>> We definitely need to understand the above.
>>>
>>> there is a hardware-bug which may cause this...afair i saw this with
>>> the card in r64 with earlier Kernel-versions where other cards work
>>> (like the mt7612e).
>>
>> I don't think a HW bug affecting PCI would cause what we are seeing
>> here, unless it results in memory corruption.
> 
> 
> [1] https://www.asiarf.com/shop/wifi-wlan/wifi_mini_pcie/ws2433-wifi-11ac-mini-pcie-module-manufacturer/
> [2] grep -Rni 'msi' drivers/net/wireless/mediatek/mt76/mt76x2/
> [3] https://patchwork.kernel.org/project/linux-mediatek/list/?series=372885
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
diff mbox series

Patch

diff --git a/drivers/pci/controller/pcie-mediatek.c b/drivers/pci/controller/pcie-mediatek.c
index cf4c18f0c25a..27241e7e1eb6 100644
--- a/drivers/pci/controller/pcie-mediatek.c
+++ b/drivers/pci/controller/pcie-mediatek.c
@@ -143,6 +143,7 @@  struct mtk_pcie_port;
  * struct mtk_pcie_soc - differentiate between host generations
  * @need_fix_class_id: whether this host's class ID needed to be fixed or not
  * @need_fix_device_id: whether this host's device ID needed to be fixed or not
+ * @no_msi: Bridge has no MSI support
  * @device_id: device ID which this host need to be fixed
  * @ops: pointer to configuration access functions
  * @startup: pointer to controller setting functions
@@ -151,6 +152,7 @@  struct mtk_pcie_port;
 struct mtk_pcie_soc {
 	bool need_fix_class_id;
 	bool need_fix_device_id;
+	bool no_msi;
 	unsigned int device_id;
 	struct pci_ops *ops;
 	int (*startup)(struct mtk_pcie_port *port);
@@ -1084,6 +1086,7 @@  static int mtk_pcie_probe(struct platform_device *pdev)
 
 	host->ops = pcie->soc->ops;
 	host->sysdata = pcie;
+	host->no_msi = pcie->soc->no_msi;
 
 	err = pci_host_probe(host);
 	if (err)
@@ -1173,6 +1176,7 @@  static const struct dev_pm_ops mtk_pcie_pm_ops = {
 };
 
 static const struct mtk_pcie_soc mtk_pcie_soc_v1 = {
+	.no_msi = true,
 	.ops = &mtk_pcie_ops,
 	.startup = mtk_pcie_startup_port,
 };