Message ID | 1659526134-22978-3-git-send-email-quic_krichai@quicinc.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | PCI: Restrict pci transactions after pci suspend | expand |
Hi Krishna, Thank you for the patch! Yet something to improve: [auto build test ERROR on helgaas-pci/next] [also build test ERROR on next-20220803] [cannot apply to linus/master v5.19] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Krishna-chaitanya-chundru/PCI-Restrict-pci-transactions-after-pci-suspend/20220803-193033 base: https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next config: s390-randconfig-r044-20220804 (https://download.01.org/0day-ci/archive/20220804/202208041800.BLHfNzzW-lkp@intel.com/config) compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 26dd42705c2af0b8f6e5d6cdb32c9bd5ed9524eb) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install s390 cross compiling tool for clang build # apt-get install binutils-s390x-linux-gnu # https://github.com/intel-lab-lkp/linux/commit/9e799ab52db3245911b15a8903977d99554d900d git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Krishna-chaitanya-chundru/PCI-Restrict-pci-transactions-after-pci-suspend/20220803-193033 git checkout 9e799ab52db3245911b15a8903977d99554d900d # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash drivers/pci/controller/dwc/ If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from drivers/pci/controller/dwc/pcie-designware-host.c:11: In file included from include/linux/irqchip/chained_irq.h:10: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/s390/include/asm/io.h:75: include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __raw_readb(PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr)); ~~~~~~~~~~ ^ include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu' #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x)) ^ include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16' #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x)) ^ In file included from drivers/pci/controller/dwc/pcie-designware-host.c:11: In file included from include/linux/irqchip/chained_irq.h:10: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/s390/include/asm/io.h:75: include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr)); ~~~~~~~~~~ ^ include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu' #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x)) ^ include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32' #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x)) ^ In file included from drivers/pci/controller/dwc/pcie-designware-host.c:11: In file included from include/linux/irqchip/chained_irq.h:10: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/s390/include/asm/io.h:75: include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writeb(value, PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] readsb(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] readsw(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] readsl(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] writesb(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] writesw(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] writesl(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ >> drivers/pci/controller/dwc/pcie-designware-host.c:33:24: error: static assertion failed due to requirement '__builtin_types_compatible_p(struct pcie_port, struct dw_pcie_rp) || __builtin_types_compatible_p(struct pcie_port, void)': pointer type mismatch in container_of() struct dw_pcie *pci = to_dw_pcie_from_pp(pp); ^~~~~~~~~~~~~~~~~~~~~~ drivers/pci/controller/dwc/pcie-designware.h:327:34: note: expanded from macro 'to_dw_pcie_from_pp' #define to_dw_pcie_from_pp(port) container_of((port), struct dw_pcie, pp) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/container_of.h:19:2: note: expanded from macro 'container_of' static_assert(__same_type(*(ptr), ((type *)0)->member) || \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:77:34: note: expanded from macro 'static_assert' #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:78:41: note: expanded from macro '__static_assert' #define __static_assert(expr, msg, ...) _Static_assert(expr, msg) ^ ~~~~ drivers/pci/controller/dwc/pcie-designware-host.c:44:24: error: static assertion failed due to requirement '__builtin_types_compatible_p(struct pcie_port, struct dw_pcie_rp) || __builtin_types_compatible_p(struct pcie_port, void)': pointer type mismatch in container_of() struct dw_pcie *pci = to_dw_pcie_from_pp(pp); ^~~~~~~~~~~~~~~~~~~~~~ drivers/pci/controller/dwc/pcie-designware.h:327:34: note: expanded from macro 'to_dw_pcie_from_pp' #define to_dw_pcie_from_pp(port) container_of((port), struct dw_pcie, pp) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/container_of.h:19:2: note: expanded from macro 'container_of' static_assert(__same_type(*(ptr), ((type *)0)->member) || \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:77:34: note: expanded from macro 'static_assert' #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:78:41: note: expanded from macro '__static_assert' #define __static_assert(expr, msg, ...) _Static_assert(expr, msg) ^ ~~~~ 12 warnings and 2 errors generated. vim +33 drivers/pci/controller/dwc/pcie-designware-host.c 29 30 static void dw_msi_mask_irq(struct irq_data *d) 31 { 32 struct pcie_port *pp = irq_data_get_irq_chip_data(d->parent_data); > 33 struct dw_pcie *pci = to_dw_pcie_from_pp(pp); 34 35 if (dw_pcie_link_up(pci)) 36 pci_msi_mask_irq(d); 37 38 irq_chip_mask_parent(d); 39 } 40
Quoting Krishna chaitanya chundru (2022-08-03 04:28:53) > If the endpoint device state is D0 and irq's are not freed, then > kernel try to mask interrupts in system suspend path by writing > in to the vector table (for MSIX interrupts) and config space (for MSI's). > > These transactions are initiated in the pm suspend after pcie clocks got > disabled as part of platform driver pm suspend call. Due to it, these > transactions are resulting in un-clocked access and eventually to crashes. Why are the platform driver pm suspend calls disabling clks that early? Can they disable clks in noirq phase, or even later, so that we don't have to check if the device is clocking in the irq poking functions? It's best to keep irq operations fast, so that irq control is fast given that these functions are called from irq flow handlers.
On 8/9/2022 12:42 AM, Stephen Boyd wrote: > Quoting Krishna chaitanya chundru (2022-08-03 04:28:53) >> If the endpoint device state is D0 and irq's are not freed, then >> kernel try to mask interrupts in system suspend path by writing >> in to the vector table (for MSIX interrupts) and config space (for MSI's). >> >> These transactions are initiated in the pm suspend after pcie clocks got >> disabled as part of platform driver pm suspend call. Due to it, these >> transactions are resulting in un-clocked access and eventually to crashes. > Why are the platform driver pm suspend calls disabling clks that early? > Can they disable clks in noirq phase, or even later, so that we don't > have to check if the device is clocking in the irq poking functions? > It's best to keep irq operations fast, so that irq control is fast given > that these functions are called from irq flow handlers. We are registering the pcie pm suspend ops as noirq ops only. And this msix and config access is coming at the later point of time that is reason we added that check.
Quoting Krishna Chaitanya Chundru (2022-08-23 20:37:59) > > On 8/9/2022 12:42 AM, Stephen Boyd wrote: > > Quoting Krishna chaitanya chundru (2022-08-03 04:28:53) > >> If the endpoint device state is D0 and irq's are not freed, then > >> kernel try to mask interrupts in system suspend path by writing > >> in to the vector table (for MSIX interrupts) and config space (for MSI's). > >> > >> These transactions are initiated in the pm suspend after pcie clocks got > >> disabled as part of platform driver pm suspend call. Due to it, these > >> transactions are resulting in un-clocked access and eventually to crashes. > > Why are the platform driver pm suspend calls disabling clks that early? > > Can they disable clks in noirq phase, or even later, so that we don't > > have to check if the device is clocking in the irq poking functions? > > It's best to keep irq operations fast, so that irq control is fast given > > that these functions are called from irq flow handlers. > > We are registering the pcie pm suspend ops as noirq ops only. And this > msix and config > > access is coming at the later point of time that is reason we added that > check. > What is accessing msix and config? Can you dump_stack() after noirq ops are called and figure out what is trying to access the bus when it is powered down?
On 8/24/2022 10:50 PM, Stephen Boyd wrote: > Quoting Krishna Chaitanya Chundru (2022-08-23 20:37:59) >> On 8/9/2022 12:42 AM, Stephen Boyd wrote: >>> Quoting Krishna chaitanya chundru (2022-08-03 04:28:53) >>>> If the endpoint device state is D0 and irq's are not freed, then >>>> kernel try to mask interrupts in system suspend path by writing >>>> in to the vector table (for MSIX interrupts) and config space (for MSI's). >>>> >>>> These transactions are initiated in the pm suspend after pcie clocks got >>>> disabled as part of platform driver pm suspend call. Due to it, these >>>> transactions are resulting in un-clocked access and eventually to crashes. >>> Why are the platform driver pm suspend calls disabling clks that early? >>> Can they disable clks in noirq phase, or even later, so that we don't >>> have to check if the device is clocking in the irq poking functions? >>> It's best to keep irq operations fast, so that irq control is fast given >>> that these functions are called from irq flow handlers. >> We are registering the pcie pm suspend ops as noirq ops only. And this >> msix and config >> >> access is coming at the later point of time that is reason we added that >> check. >> > What is accessing msix and config? Can you dump_stack() after noirq ops > are called and figure out what is trying to access the bus when it is > powered down? The msix and config space is being accessed to mask interrupts. The access is coming at the end of the suspend and near CPU disable. We tried to dump the stack there but the call stack is not coming as it is near cpu disable. But we got dump at resume please have look at it [ 54.946268] Enabling non-boot CPUs ... [ 54.951182] CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 5.15.41 #105 43491e4414b1db8a6f59d56b617b520d92a9498e [ 54.961122] Hardware name: Qualcomm Technologies, Inc. sc7280 IDP SKU2 platform (DT) [ 54.969088] Call trace: [ 54.971612] dump_backtrace+0x0/0x200 [ 54.975399] show_stack+0x20/0x2c [ 54.978826] dump_stack_lvl+0x6c/0x90 [ 54.982614] dump_stack+0x18/0x38 [ 54.986043] dw_msi_unmask_irq+0x2c/0x58 [ 54.990096] irq_enable+0x58/0x90 [ 54.993522] __irq_startup+0x68/0x94 [ 54.997216] irq_startup+0xf4/0x140 [ 55.000820] irq_affinity_online_cpu+0xc8/0x154 [ 55.005491] cpuhp_invoke_callback+0x19c/0x6e4 [ 55.010077] cpuhp_thread_fun+0x11c/0x188 [ 55.014216] smpboot_thread_fn+0x1ac/0x30c [ 55.018445] kthread+0x140/0x30c [ 55.021788] ret_from_fork+0x10/0x20 [ 55.028243] CPU1 is up So the same stack should be called at the suspend path while disabling CPU. If there is any other way to remove these calls can you please help us point that way. Thanks & Regards, Krishna Chaitanya
Quoting Krishna Chaitanya Chundru (2022-08-25 06:52:43) > > On 8/24/2022 10:50 PM, Stephen Boyd wrote: > > Quoting Krishna Chaitanya Chundru (2022-08-23 20:37:59) > >> On 8/9/2022 12:42 AM, Stephen Boyd wrote: > >>> Quoting Krishna chaitanya chundru (2022-08-03 04:28:53) > >>>> If the endpoint device state is D0 and irq's are not freed, then > >>>> kernel try to mask interrupts in system suspend path by writing > >>>> in to the vector table (for MSIX interrupts) and config space (for MSI's). > >>>> > >>>> These transactions are initiated in the pm suspend after pcie clocks got > >>>> disabled as part of platform driver pm suspend call. Due to it, these > >>>> transactions are resulting in un-clocked access and eventually to crashes. > >>> Why are the platform driver pm suspend calls disabling clks that early? > >>> Can they disable clks in noirq phase, or even later, so that we don't > >>> have to check if the device is clocking in the irq poking functions? > >>> It's best to keep irq operations fast, so that irq control is fast given > >>> that these functions are called from irq flow handlers. > >> We are registering the pcie pm suspend ops as noirq ops only. And this > >> msix and config > >> > >> access is coming at the later point of time that is reason we added that > >> check. > >> > > What is accessing msix and config? Can you dump_stack() after noirq ops > > are called and figure out what is trying to access the bus when it is > > powered down? > > The msix and config space is being accessed to mask interrupts. The > access is coming at the end of the suspend > > and near CPU disable. We tried to dump the stack there but the call > stack is not coming as it is near cpu disable. That is odd that you can't get a stacktrace. > > But we got dump at resume please have look at it > > [ 54.946268] Enabling non-boot CPUs ... > [ 54.951182] CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 5.15.41 #105 > 43491e4414b1db8a6f59d56b617b520d92a9498e > [ 54.961122] Hardware name: Qualcomm Technologies, Inc. sc7280 IDP > SKU2 platform (DT) > [ 54.969088] Call trace: > [ 54.971612] dump_backtrace+0x0/0x200 > [ 54.975399] show_stack+0x20/0x2c > [ 54.978826] dump_stack_lvl+0x6c/0x90 > [ 54.982614] dump_stack+0x18/0x38 > [ 54.986043] dw_msi_unmask_irq+0x2c/0x58 > [ 54.990096] irq_enable+0x58/0x90 > [ 54.993522] __irq_startup+0x68/0x94 > [ 54.997216] irq_startup+0xf4/0x140 > [ 55.000820] irq_affinity_online_cpu+0xc8/0x154 > [ 55.005491] cpuhp_invoke_callback+0x19c/0x6e4 > [ 55.010077] cpuhp_thread_fun+0x11c/0x188 > [ 55.014216] smpboot_thread_fn+0x1ac/0x30c > [ 55.018445] kthread+0x140/0x30c > [ 55.021788] ret_from_fork+0x10/0x20 > [ 55.028243] CPU1 is up > > So the same stack should be called at the suspend path while disabling CPU. Sounds like you're getting hit by affinity changes while offlining CPUs during suspend (see irq_migrate_all_off_this_cpu()). That will happen after devices are suspended (all phases of suspend ops). > > If there is any other way to remove these calls can you please help us > point that way. I'm not sure. I believe genirq assumes the irqchips are always accessible. There is some support to suspend irqchips. See how the struct irq_chip::irq_suspend() function is called by syscore ops in the generic irqchip 'irq_gc_syscore_ops' hooks. Maybe you could add a syscore suspend/resume hook to disable/enable the clks and power to the PCI controller. syscore ops run after secondary CPUs are hotplugged out during suspend. Or maybe setting the IRQCHIP_MASK_ON_SUSPEND flag can be used so that on irq migration nothing writes the irq hardware because it is already masked in the hardware earlier. I think the problem is that on resume we'll restart the irq from the first CPU online event, when you don't want to do that because it is too early. I have another question though, which is do MSIs support wakeup? I don't see how it works if the whole bus is effectively off during suspend. If wakeup needs to be supported then I suspect the bus can't be powered down during suspend.
On Fri, Aug 26, 2022 at 03:23:00PM -0500, Stephen Boyd wrote: > Quoting Krishna Chaitanya Chundru (2022-08-25 06:52:43) > > > > On 8/24/2022 10:50 PM, Stephen Boyd wrote: > > > Quoting Krishna Chaitanya Chundru (2022-08-23 20:37:59) > > >> On 8/9/2022 12:42 AM, Stephen Boyd wrote: > > >>> Quoting Krishna chaitanya chundru (2022-08-03 04:28:53) > > >>>> If the endpoint device state is D0 and irq's are not freed, then > > >>>> kernel try to mask interrupts in system suspend path by writing > > >>>> in to the vector table (for MSIX interrupts) and config space (for MSI's). > > >>>> > > >>>> These transactions are initiated in the pm suspend after pcie clocks got > > >>>> disabled as part of platform driver pm suspend call. Due to it, these > > >>>> transactions are resulting in un-clocked access and eventually to crashes. > > >>> Why are the platform driver pm suspend calls disabling clks that early? > > >>> Can they disable clks in noirq phase, or even later, so that we don't > > >>> have to check if the device is clocking in the irq poking functions? > > >>> It's best to keep irq operations fast, so that irq control is fast given > > >>> that these functions are called from irq flow handlers. > > >> We are registering the pcie pm suspend ops as noirq ops only. And this > > >> msix and config > > >> > > >> access is coming at the later point of time that is reason we added that > > >> check. > > >> > > > What is accessing msix and config? Can you dump_stack() after noirq ops > > > are called and figure out what is trying to access the bus when it is > > > powered down? > > > > The msix and config space is being accessed to mask interrupts. The > > access is coming at the end of the suspend > > > > and near CPU disable. We tried to dump the stack there but the call > > stack is not coming as it is near cpu disable. > > That is odd that you can't get a stacktrace. > > > > > But we got dump at resume please have look at it > > > > [ 54.946268] Enabling non-boot CPUs ... > > [ 54.951182] CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 5.15.41 #105 > > 43491e4414b1db8a6f59d56b617b520d92a9498e > > [ 54.961122] Hardware name: Qualcomm Technologies, Inc. sc7280 IDP > > SKU2 platform (DT) > > [ 54.969088] Call trace: > > [ 54.971612] dump_backtrace+0x0/0x200 > > [ 54.975399] show_stack+0x20/0x2c > > [ 54.978826] dump_stack_lvl+0x6c/0x90 > > [ 54.982614] dump_stack+0x18/0x38 > > [ 54.986043] dw_msi_unmask_irq+0x2c/0x58 > > [ 54.990096] irq_enable+0x58/0x90 > > [ 54.993522] __irq_startup+0x68/0x94 > > [ 54.997216] irq_startup+0xf4/0x140 > > [ 55.000820] irq_affinity_online_cpu+0xc8/0x154 > > [ 55.005491] cpuhp_invoke_callback+0x19c/0x6e4 > > [ 55.010077] cpuhp_thread_fun+0x11c/0x188 > > [ 55.014216] smpboot_thread_fn+0x1ac/0x30c > > [ 55.018445] kthread+0x140/0x30c > > [ 55.021788] ret_from_fork+0x10/0x20 > > [ 55.028243] CPU1 is up > > > > So the same stack should be called at the suspend path while disabling CPU. > > Sounds like you're getting hit by affinity changes while offlining CPUs > during suspend (see irq_migrate_all_off_this_cpu()). That will happen > after devices are suspended (all phases of suspend ops). The affinity setting should not happen since DWC MSI controller doesn't support setting IRQ affinity (hierarchial IRQ domain). In the migrate_one_irq() function, there is a check for the existence of the irq_set_affinity() callback, but the DWC MSI controller return -EINVAL in the callback. So this is the reason the migration was still atempted? A quick check would be to test this suspend/resume with GIC ITS for MSI since it supports settings IRQ affinity and resides in a separate domain. Chaitanya, can you try that? > > > > > If there is any other way to remove these calls can you please help us > > point that way. > > I'm not sure. I believe genirq assumes the irqchips are always > accessible. There is some support to suspend irqchips. See how the > struct irq_chip::irq_suspend() function is called by syscore ops in the > generic irqchip 'irq_gc_syscore_ops' hooks. Maybe you could add a > syscore suspend/resume hook to disable/enable the clks and power to the > PCI controller. syscore ops run after secondary CPUs are hotplugged out > during suspend. > > Or maybe setting the IRQCHIP_MASK_ON_SUSPEND flag can be used so that on > irq migration nothing writes the irq hardware because it is already > masked in the hardware earlier. I think the problem is that on resume > we'll restart the irq from the first CPU online event, when you don't > want to do that because it is too early. > > I have another question though, which is do MSIs support wakeup? I don't > see how it works if the whole bus is effectively off during suspend. If > wakeup needs to be supported then I suspect the bus can't be powered > down during suspend. Wake up should be handled by a dedicated side-band GPIO or in-band PME message. But I still wonder how the link stays in L1/L1ss when the clocks are disabled and PHY is powered down. Maybe the link or phy is powered by a separate power domain like MX that keeps the link active? Thanks, Mani
On 8/27/2022 10:56 PM, Manivannan Sadhasivam wrote: > On Fri, Aug 26, 2022 at 03:23:00PM -0500, Stephen Boyd wrote: >> Quoting Krishna Chaitanya Chundru (2022-08-25 06:52:43) >>> On 8/24/2022 10:50 PM, Stephen Boyd wrote: >>>> Quoting Krishna Chaitanya Chundru (2022-08-23 20:37:59) >>>>> On 8/9/2022 12:42 AM, Stephen Boyd wrote: >>>>>> Quoting Krishna chaitanya chundru (2022-08-03 04:28:53) >>>>>>> If the endpoint device state is D0 and irq's are not freed, then >>>>>>> kernel try to mask interrupts in system suspend path by writing >>>>>>> in to the vector table (for MSIX interrupts) and config space (for MSI's). >>>>>>> >>>>>>> These transactions are initiated in the pm suspend after pcie clocks got >>>>>>> disabled as part of platform driver pm suspend call. Due to it, these >>>>>>> transactions are resulting in un-clocked access and eventually to crashes. >>>>>> Why are the platform driver pm suspend calls disabling clks that early? >>>>>> Can they disable clks in noirq phase, or even later, so that we don't >>>>>> have to check if the device is clocking in the irq poking functions? >>>>>> It's best to keep irq operations fast, so that irq control is fast given >>>>>> that these functions are called from irq flow handlers. >>>>> We are registering the pcie pm suspend ops as noirq ops only. And this >>>>> msix and config >>>>> >>>>> access is coming at the later point of time that is reason we added that >>>>> check. >>>>> >>>> What is accessing msix and config? Can you dump_stack() after noirq ops >>>> are called and figure out what is trying to access the bus when it is >>>> powered down? >>> The msix and config space is being accessed to mask interrupts. The >>> access is coming at the end of the suspend >>> >>> and near CPU disable. We tried to dump the stack there but the call >>> stack is not coming as it is near cpu disable. >> That is odd that you can't get a stacktrace. >> >>> But we got dump at resume please have look at it >>> >>> [ 54.946268] Enabling non-boot CPUs ... >>> [ 54.951182] CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 5.15.41 #105 >>> 43491e4414b1db8a6f59d56b617b520d92a9498e >>> [ 54.961122] Hardware name: Qualcomm Technologies, Inc. sc7280 IDP >>> SKU2 platform (DT) >>> [ 54.969088] Call trace: >>> [ 54.971612] dump_backtrace+0x0/0x200 >>> [ 54.975399] show_stack+0x20/0x2c >>> [ 54.978826] dump_stack_lvl+0x6c/0x90 >>> [ 54.982614] dump_stack+0x18/0x38 >>> [ 54.986043] dw_msi_unmask_irq+0x2c/0x58 >>> [ 54.990096] irq_enable+0x58/0x90 >>> [ 54.993522] __irq_startup+0x68/0x94 >>> [ 54.997216] irq_startup+0xf4/0x140 >>> [ 55.000820] irq_affinity_online_cpu+0xc8/0x154 >>> [ 55.005491] cpuhp_invoke_callback+0x19c/0x6e4 >>> [ 55.010077] cpuhp_thread_fun+0x11c/0x188 >>> [ 55.014216] smpboot_thread_fn+0x1ac/0x30c >>> [ 55.018445] kthread+0x140/0x30c >>> [ 55.021788] ret_from_fork+0x10/0x20 >>> [ 55.028243] CPU1 is up >>> >>> So the same stack should be called at the suspend path while disabling CPU. >> Sounds like you're getting hit by affinity changes while offlining CPUs >> during suspend (see irq_migrate_all_off_this_cpu()). That will happen >> after devices are suspended (all phases of suspend ops). > The affinity setting should not happen since DWC MSI controller doesn't support > setting IRQ affinity (hierarchial IRQ domain). In the migrate_one_irq() > function, there is a check for the existence of the irq_set_affinity() > callback, but the DWC MSI controller return -EINVAL in the callback. So this > is the reason the migration was still atempted? > > A quick check would be to test this suspend/resume with GIC ITS for MSI since > it supports settings IRQ affinity and resides in a separate domain. > Chaitanya, can you try that? Hi mani, I tried with gic its there also I see same behavior. The only which helps to comment out affinity in the following function. diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 21b3ac2a29d2..042afec1cf9d 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -487,8 +487,9 @@ static int alloc_descs(unsigned int start, unsigned int cnt, int node, if (affinity) { if (affinity->is_managed) { - flags = IRQD_AFFINITY_MANAGED | - IRQD_MANAGED_SHUTDOWN; +// flags = IRQD_AFFINITY_MANAGED | +// IRQD_MANAGED_SHUTDOWN; + flags = 0;//IRQD_AFFINITY_MANAGED | } mask = &affinity->mask; node = cpu_to_node(cpumask_first(mask)); >>> If there is any other way to remove these calls can you please help us >>> point that way. >> I'm not sure. I believe genirq assumes the irqchips are always >> accessible. There is some support to suspend irqchips. See how the >> struct irq_chip::irq_suspend() function is called by syscore ops in the >> generic irqchip 'irq_gc_syscore_ops' hooks. Maybe you could add a >> syscore suspend/resume hook to disable/enable the clks and power to the >> PCI controller. syscore ops run after secondary CPUs are hotplugged out >> during suspend. >> >> Or maybe setting the IRQCHIP_MASK_ON_SUSPEND flag can be used so that on >> irq migration nothing writes the irq hardware because it is already >> masked in the hardware earlier. I think the problem is that on resume >> we'll restart the irq from the first CPU online event, when you don't >> want to do that because it is too early. >> >> I have another question though, which is do MSIs support wakeup? I don't >> see how it works if the whole bus is effectively off during suspend. If >> wakeup needs to be supported then I suspect the bus can't be powered >> down during suspend. > Wake up should be handled by a dedicated side-band GPIO or in-band PME message. > > But I still wonder how the link stays in L1/L1ss when the clocks are disabled > and PHY is powered down. Maybe the link or phy is powered by a separate power > domain like MX that keeps the link active? We will come back to you on this. > > Thanks, > Mani >
On Mon, Aug 29, 2022 at 11:01:58PM +0530, Krishna Chaitanya Chundru wrote: > > On 8/27/2022 10:56 PM, Manivannan Sadhasivam wrote: > > On Fri, Aug 26, 2022 at 03:23:00PM -0500, Stephen Boyd wrote: > > > Quoting Krishna Chaitanya Chundru (2022-08-25 06:52:43) > > > > On 8/24/2022 10:50 PM, Stephen Boyd wrote: > > > > > Quoting Krishna Chaitanya Chundru (2022-08-23 20:37:59) > > > > > > On 8/9/2022 12:42 AM, Stephen Boyd wrote: > > > > > > > Quoting Krishna chaitanya chundru (2022-08-03 04:28:53) > > > > > > > > If the endpoint device state is D0 and irq's are not freed, then > > > > > > > > kernel try to mask interrupts in system suspend path by writing > > > > > > > > in to the vector table (for MSIX interrupts) and config space (for MSI's). > > > > > > > > > > > > > > > > These transactions are initiated in the pm suspend after pcie clocks got > > > > > > > > disabled as part of platform driver pm suspend call. Due to it, these > > > > > > > > transactions are resulting in un-clocked access and eventually to crashes. > > > > > > > Why are the platform driver pm suspend calls disabling clks that early? > > > > > > > Can they disable clks in noirq phase, or even later, so that we don't > > > > > > > have to check if the device is clocking in the irq poking functions? > > > > > > > It's best to keep irq operations fast, so that irq control is fast given > > > > > > > that these functions are called from irq flow handlers. > > > > > > We are registering the pcie pm suspend ops as noirq ops only. And this > > > > > > msix and config > > > > > > > > > > > > access is coming at the later point of time that is reason we added that > > > > > > check. > > > > > > > > > > > What is accessing msix and config? Can you dump_stack() after noirq ops > > > > > are called and figure out what is trying to access the bus when it is > > > > > powered down? > > > > The msix and config space is being accessed to mask interrupts. The > > > > access is coming at the end of the suspend > > > > > > > > and near CPU disable. We tried to dump the stack there but the call > > > > stack is not coming as it is near cpu disable. > > > That is odd that you can't get a stacktrace. > > > > > > > But we got dump at resume please have look at it > > > > > > > > [ 54.946268] Enabling non-boot CPUs ... > > > > [ 54.951182] CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 5.15.41 #105 > > > > 43491e4414b1db8a6f59d56b617b520d92a9498e > > > > [ 54.961122] Hardware name: Qualcomm Technologies, Inc. sc7280 IDP > > > > SKU2 platform (DT) > > > > [ 54.969088] Call trace: > > > > [ 54.971612] dump_backtrace+0x0/0x200 > > > > [ 54.975399] show_stack+0x20/0x2c > > > > [ 54.978826] dump_stack_lvl+0x6c/0x90 > > > > [ 54.982614] dump_stack+0x18/0x38 > > > > [ 54.986043] dw_msi_unmask_irq+0x2c/0x58 > > > > [ 54.990096] irq_enable+0x58/0x90 > > > > [ 54.993522] __irq_startup+0x68/0x94 > > > > [ 54.997216] irq_startup+0xf4/0x140 > > > > [ 55.000820] irq_affinity_online_cpu+0xc8/0x154 > > > > [ 55.005491] cpuhp_invoke_callback+0x19c/0x6e4 > > > > [ 55.010077] cpuhp_thread_fun+0x11c/0x188 > > > > [ 55.014216] smpboot_thread_fn+0x1ac/0x30c > > > > [ 55.018445] kthread+0x140/0x30c > > > > [ 55.021788] ret_from_fork+0x10/0x20 > > > > [ 55.028243] CPU1 is up > > > > > > > > So the same stack should be called at the suspend path while disabling CPU. > > > Sounds like you're getting hit by affinity changes while offlining CPUs > > > during suspend (see irq_migrate_all_off_this_cpu()). That will happen > > > after devices are suspended (all phases of suspend ops). > > The affinity setting should not happen since DWC MSI controller doesn't support > > setting IRQ affinity (hierarchial IRQ domain). In the migrate_one_irq() > > function, there is a check for the existence of the irq_set_affinity() > > callback, but the DWC MSI controller return -EINVAL in the callback. So this > > is the reason the migration was still atempted? > > > > A quick check would be to test this suspend/resume with GIC ITS for MSI since > > it supports settings IRQ affinity and resides in a separate domain. > > Chaitanya, can you try that? > > Hi mani, > > I tried with gic its there also I see same behavior. > Okay > The only which helps to comment out affinity in the following function. > > diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c > index 21b3ac2a29d2..042afec1cf9d 100644 > --- a/kernel/irq/irqdesc.c > +++ b/kernel/irq/irqdesc.c > @@ -487,8 +487,9 @@ static int alloc_descs(unsigned int start, unsigned int > cnt, int node, > > > > if (affinity) { > if (affinity->is_managed) { > - flags = IRQD_AFFINITY_MANAGED | > - IRQD_MANAGED_SHUTDOWN; > +// flags = IRQD_AFFINITY_MANAGED | > +// IRQD_MANAGED_SHUTDOWN; > + flags = 0;//IRQD_AFFINITY_MANAGED | > } > mask = &affinity->mask; > node = cpu_to_node(cpumask_first(mask)); > The only solution I can think of is keeping the clocks related to DBI access active or switch to another clock source that consumes less power if available during suspend. But limiting the DBI access using hacks doesn't look good. Thanks, Mani > > > > If there is any other way to remove these calls can you please help us > > > > point that way. > > > I'm not sure. I believe genirq assumes the irqchips are always > > > accessible. There is some support to suspend irqchips. See how the > > > struct irq_chip::irq_suspend() function is called by syscore ops in the > > > generic irqchip 'irq_gc_syscore_ops' hooks. Maybe you could add a > > > syscore suspend/resume hook to disable/enable the clks and power to the > > > PCI controller. syscore ops run after secondary CPUs are hotplugged out > > > during suspend. > > > > > > Or maybe setting the IRQCHIP_MASK_ON_SUSPEND flag can be used so that on > > > irq migration nothing writes the irq hardware because it is already > > > masked in the hardware earlier. I think the problem is that on resume > > > we'll restart the irq from the first CPU online event, when you don't > > > want to do that because it is too early. > > > > > > I have another question though, which is do MSIs support wakeup? I don't > > > see how it works if the whole bus is effectively off during suspend. If > > > wakeup needs to be supported then I suspect the bus can't be powered > > > down during suspend. > > Wake up should be handled by a dedicated side-band GPIO or in-band PME message. > > > > But I still wonder how the link stays in L1/L1ss when the clocks are disabled > > and PHY is powered down. Maybe the link or phy is powered by a separate power > > domain like MX that keeps the link active? > We will come back to you on this. Okay. Thanks, Mani > > > > Thanks, > > Mani > > >
On 8/30/2022 5:25 PM, Manivannan Sadhasivam wrote: <SNIP>... >> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c >> index 21b3ac2a29d2..042afec1cf9d 100644 >> --- a/kernel/irq/irqdesc.c >> +++ b/kernel/irq/irqdesc.c >> @@ -487,8 +487,9 @@ static int alloc_descs(unsigned int start, unsigned int >> cnt, int node, >> >> >> >> if (affinity) { >> if (affinity->is_managed) { >> - flags = IRQD_AFFINITY_MANAGED | >> - IRQD_MANAGED_SHUTDOWN; >> +// flags = IRQD_AFFINITY_MANAGED | >> +// IRQD_MANAGED_SHUTDOWN; >> + flags = 0;//IRQD_AFFINITY_MANAGED | >> } >> mask = &affinity->mask; >> node = cpu_to_node(cpumask_first(mask)); >> > The only solution I can think of is keeping the clocks related to DBI access > active or switch to another clock source that consumes less power if available > during suspend. > > But limiting the DBI access using hacks doesn't look good. Why not just define "irq_startup and irq_shutdown" callbacks for dw_pcie_msi_irq_chip? So when the cpu is offlined and irq_shutdown is called for that irqchip in migrate_one_irq(), you would mask the irq and then disable the clocks. Similarly, on CPU onlining, you would enable the clocks and unmask the irq. This way XO is still achieved as you are turning off the clocks before suspend and back on after resume. Thanks, Sai
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c index 9979302..2b1e395 100644 --- a/drivers/pci/controller/dwc/pcie-designware-host.c +++ b/drivers/pci/controller/dwc/pcie-designware-host.c @@ -29,13 +29,23 @@ static void dw_msi_ack_irq(struct irq_data *d) static void dw_msi_mask_irq(struct irq_data *d) { - pci_msi_mask_irq(d); + struct pcie_port *pp = irq_data_get_irq_chip_data(d->parent_data); + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); + + if (dw_pcie_link_up(pci)) + pci_msi_mask_irq(d); + irq_chip_mask_parent(d); } static void dw_msi_unmask_irq(struct irq_data *d) { - pci_msi_unmask_irq(d); + struct pcie_port *pp = irq_data_get_irq_chip_data(d->parent_data); + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); + + if (dw_pcie_link_up(pci)) + pci_msi_unmask_irq(d); + irq_chip_unmask_parent(d); } diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c index 9dd50fc0..f7dd5dc 100644 --- a/drivers/pci/controller/dwc/pcie-qcom.c +++ b/drivers/pci/controller/dwc/pcie-qcom.c @@ -1433,11 +1433,41 @@ static int qcom_pcie_post_init_2_9_0(struct qcom_pcie *pcie) return 0; } +static u32 qcom_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base, + u32 reg, size_t size) +{ + struct qcom_pcie *pcie = to_qcom_pcie(pci); + u32 val; + + if (pcie->is_suspended) + return PCIBIOS_BAD_REGISTER_NUMBER; + + dw_pcie_read(base + reg, size, &val); + return val; +} + +static void qcom_pcie_write_dbi(struct dw_pcie *pci, void __iomem *base, + u32 reg, size_t size, u32 val) +{ + struct qcom_pcie *pcie = to_qcom_pcie(pci); + + if (pcie->is_suspended) + return; + + dw_pcie_write(base + reg, size, val); +} + static int qcom_pcie_link_up(struct dw_pcie *pci) { - u16 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); - u16 val = readw(pci->dbi_base + offset + PCI_EXP_LNKSTA); + struct qcom_pcie *pcie = to_qcom_pcie(pci); + u16 offset; + u16 val; + + if (pcie->is_suspended) + return false; + offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); + val = readw(pci->dbi_base + offset + PCI_EXP_LNKSTA); return !!(val & PCI_EXP_LNKSTA_DLLLA); } @@ -1702,6 +1732,8 @@ static const struct qcom_pcie_cfg ipq6018_cfg = { static const struct dw_pcie_ops dw_pcie_ops = { .link_up = qcom_pcie_link_up, .start_link = qcom_pcie_start_link, + .read_dbi = qcom_pcie_read_dbi, + .write_dbi = qcom_pcie_write_dbi, }; static int qcom_pcie_probe(struct platform_device *pdev)
If the endpoint device state is D0 and irq's are not freed, then kernel try to mask interrupts in system suspend path by writing in to the vector table (for MSIX interrupts) and config space (for MSI's). These transactions are initiated in the pm suspend after pcie clocks got disabled as part of platform driver pm suspend call. Due to it, these transactions are resulting in un-clocked access and eventually to crashes. So added a logic in qcom driver to restrict these unclocked access. And updated the logic to check the link state before masking or unmasking the interrupts. Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com> --- drivers/pci/controller/dwc/pcie-designware-host.c | 14 +++++++-- drivers/pci/controller/dwc/pcie-qcom.c | 36 +++++++++++++++++++++-- 2 files changed, 46 insertions(+), 4 deletions(-)