Message ID | 20200901144324.1071694-17-maz@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm/arm64: Turning IPIs into normal interrupts | expand |
Hi Marc, I love your patch! Yet something to improve: [auto build test ERROR on arm64/for-next/core] [also build test ERROR on arm/for-next v5.9-rc3 next-20200828] [cannot apply to tip/irq/core] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Marc-Zyngier/arm-arm64-Turning-IPIs-into-normal-interrupts/20200901-225407 base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core config: arm-realview_defconfig (attached as .config) compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): arch/arm/kernel/smp.c: In function 'show_ipi_list': >> arch/arm/kernel/smp.c:537:27: error: implicit declaration of function 'kstat_irqs_cpu' [-Werror=implicit-function-declaration] 537 | seq_printf(p, "%10u ", kstat_irqs_cpu(irq, cpu)); | ^~~~~~~~~~~~~~ arch/arm/kernel/smp.c: At top level: arch/arm/kernel/smp.c:559:6: warning: no previous prototype for 'arch_irq_work_raise' [-Wmissing-prototypes] 559 | void arch_irq_work_raise(void) | ^~~~~~~~~~~~~~~~~~~ arch/arm/kernel/smp.c:774:6: warning: no previous prototype for 'panic_smp_self_stop' [-Wmissing-prototypes] 774 | void panic_smp_self_stop(void) | ^~~~~~~~~~~~~~~~~~~ arch/arm/kernel/smp.c:786:5: warning: no previous prototype for 'setup_profiling_timer' [-Wmissing-prototypes] 786 | int setup_profiling_timer(unsigned int multiplier) | ^~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors # https://github.com/0day-ci/linux/commit/780a93ca7a729bf2414f45d5322cc62f43ae4070 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Marc-Zyngier/arm-arm64-Turning-IPIs-into-normal-interrupts/20200901-225407 git checkout 780a93ca7a729bf2414f45d5322cc62f43ae4070 vim +/kstat_irqs_cpu +537 arch/arm/kernel/smp.c 527 528 void show_ipi_list(struct seq_file *p, int prec) 529 { 530 unsigned int cpu, i; 531 532 for (i = 0; i < NR_IPI; i++) { 533 unsigned int irq = irq_desc_get_irq(ipi_desc[i]); 534 seq_printf(p, "%*s%u: ", prec - 1, "IPI", i); 535 536 for_each_online_cpu(cpu) > 537 seq_printf(p, "%10u ", kstat_irqs_cpu(irq, cpu)); 538 539 seq_printf(p, " %s\n", ipi_types[i]); 540 } 541 } 542 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Wed, 02 Sep 2020 08:41:31 +0100, kernel test robot <lkp@intel.com> wrote: > > Hi Marc, > > I love your patch! Yet something to improve: > > [auto build test ERROR on arm64/for-next/core] > [also build test ERROR on arm/for-next v5.9-rc3 next-20200828] > [cannot apply to tip/irq/core] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch] > > url: https://github.com/0day-ci/linux/commits/Marc-Zyngier/arm-arm64-Turning-IPIs-into-normal-interrupts/20200901-225407 > base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core > config: arm-realview_defconfig (attached as .config) > compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0 > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot <lkp@intel.com> > > All errors (new ones prefixed by >>): > > arch/arm/kernel/smp.c: In function 'show_ipi_list': > >> arch/arm/kernel/smp.c:537:27: error: implicit declaration of function 'kstat_irqs_cpu' [-Werror=implicit-function-declaration] > 537 | seq_printf(p, "%10u ", kstat_irqs_cpu(irq, cpu)); > | ^~~~~~~~~~~~~~ Now fixed. Thanks for the heads up. M.
Hi Marc, On 01/09/2020 15:43, Marc Zyngier wrote: > Let's switch the arm code to the core accounting, which already > does everything we need. > > Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> > Signed-off-by: Marc Zyngier <maz@kernel.org> > --- > arch/arm/include/asm/hardirq.h | 17 ----------------- > arch/arm/kernel/smp.c | 20 ++++---------------- > 2 files changed, 4 insertions(+), 33 deletions(-) This appears to be causing a NULL pointer dereference on beaglebone-black, it got bisected automatically several times. None of the other platforms in the KernelCI labs appears to be affected. Here's the error in the full job log, with next-20200923: https://storage.staging.kernelci.org/kernelci/staging.kernelci.org/staging-20200924.0/arm/multi_v7_defconfig/gcc-8/lab-baylibre/baseline-beaglebone-black.html#L460 and some meta-data: https://staging.kernelci.org/test/case/id/5f6bea67f724eb1b34dce584/ The full bisection report is available here: https://groups.io/g/kernelci-results-staging/message/2094 I've also run it again with a debug build to locate the problem, see below. > diff --git a/arch/arm/include/asm/hardirq.h b/arch/arm/include/asm/hardirq.h > index 7a88f160b1fb..b95848ed2bc7 100644 > --- a/arch/arm/include/asm/hardirq.h > +++ b/arch/arm/include/asm/hardirq.h > @@ -6,29 +6,12 @@ > #include <linux/threads.h> > #include <asm/irq.h> > > -/* number of IPIS _not_ including IPI_CPU_BACKTRACE */ > -#define NR_IPI 7 > - > typedef struct { > unsigned int __softirq_pending; > -#ifdef CONFIG_SMP > - unsigned int ipi_irqs[NR_IPI]; > -#endif > } ____cacheline_aligned irq_cpustat_t; > > #include <linux/irq_cpustat.h> /* Standard mappings for irq_cpustat_t above */ > > -#define __inc_irq_stat(cpu, member) __IRQ_STAT(cpu, member)++ > -#define __get_irq_stat(cpu, member) __IRQ_STAT(cpu, member) > - > -#ifdef CONFIG_SMP > -u64 smp_irq_stat_cpu(unsigned int cpu); > -#else > -#define smp_irq_stat_cpu(cpu) 0 > -#endif > - > -#define arch_irq_stat_cpu smp_irq_stat_cpu > - > #define __ARCH_IRQ_EXIT_IRQS_DISABLED 1 > > #endif /* __ASM_HARDIRQ_H */ > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > index d51e64955a26..aead847ac8b9 100644 > --- a/arch/arm/kernel/smp.c > +++ b/arch/arm/kernel/smp.c > @@ -65,6 +65,7 @@ enum ipi_msg_type { > IPI_CPU_STOP, > IPI_IRQ_WORK, > IPI_COMPLETION, > + NR_IPI, > /* > * CPU_BACKTRACE is special and not included in NR_IPI > * or tracable with trace_ipi_* > @@ -529,27 +530,16 @@ void show_ipi_list(struct seq_file *p, int prec) > unsigned int cpu, i; > > for (i = 0; i < NR_IPI; i++) { > + unsigned int irq = irq_desc_get_irq(ipi_desc[i]); It looks like irq_desc_get_irq() gets called with a NULL pointer (well, 0x0000001c): (gdb) l *0xc030ef38 0xc030ef38 is in show_ipi_list (../include/linux/irqdesc.h:123). 118 return container_of(data->common, struct irq_desc, irq_common_data); 119 } 120 121 static inline unsigned int irq_desc_get_irq(struct irq_desc *desc) 122 { 123 return desc->irq_data.irq; 124 } 125 126 static inline struct irq_data *irq_desc_get_irq_data(struct irq_desc *desc) 127 { Full job log: https://lava.baylibre.com/scheduler/job/142375#L727 I haven't looked any further but hopefully this should be a good enough clue to find the root cause. I don't know if you have a platform at hand to reproduce the issue, please let me know if you need some help with debugging or testing a fix. Hope this helps, Guillaume > seq_printf(p, "%*s%u: ", prec - 1, "IPI", i); > > for_each_online_cpu(cpu) > - seq_printf(p, "%10u ", > - __get_irq_stat(cpu, ipi_irqs[i])); > + seq_printf(p, "%10u ", kstat_irqs_cpu(irq, cpu)); > > seq_printf(p, " %s\n", ipi_types[i]); > } > } > > -u64 smp_irq_stat_cpu(unsigned int cpu) > -{ > - u64 sum = 0; > - int i; > - > - for (i = 0; i < NR_IPI; i++) > - sum += __get_irq_stat(cpu, ipi_irqs[i]); > - > - return sum; > -} > - > void arch_send_call_function_ipi_mask(const struct cpumask *mask) > { > smp_cross_call(mask, IPI_CALL_FUNC); > @@ -630,10 +620,8 @@ static void do_handle_IPI(int ipinr) > { > unsigned int cpu = smp_processor_id(); > > - if ((unsigned)ipinr < NR_IPI) { > + if ((unsigned)ipinr < NR_IPI) > trace_ipi_entry_rcuidle(ipi_types[ipinr]); > - __inc_irq_stat(cpu, ipi_irqs[ipinr]); > - } > > switch (ipinr) { > case IPI_WAKEUP: >
Hi Guillaume, On Thu, 24 Sep 2020 10:00:09 +0100, Guillaume Tucker <guillaume.tucker@collabora.com> wrote: > > Hi Marc, > > On 01/09/2020 15:43, Marc Zyngier wrote: > > Let's switch the arm code to the core accounting, which already > > does everything we need. > > > > Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> > > Signed-off-by: Marc Zyngier <maz@kernel.org> > > --- > > arch/arm/include/asm/hardirq.h | 17 ----------------- > > arch/arm/kernel/smp.c | 20 ++++---------------- > > 2 files changed, 4 insertions(+), 33 deletions(-) > > This appears to be causing a NULL pointer dereference on > beaglebone-black, it got bisected automatically several times. > None of the other platforms in the KernelCI labs appears to be > affected. Hmm. My bet is that because this is a UP machine running an SMP kernel, and I fell into the trap of forgetting about this 32bit configuration. I expect the following patch to fix it. Please give it a go if you can (I'm away at the moment and can't test much, and do not have any physical 32bit machine to test this on). diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 00327fa74b01..b4e3d336dc33 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -531,7 +531,12 @@ void show_ipi_list(struct seq_file *p, int prec) unsigned int cpu, i; for (i = 0; i < NR_IPI; i++) { - unsigned int irq = irq_desc_get_irq(ipi_desc[i]); + unsigned int irq; + + if (!ipi_desc[i]) + continue; + + irq = irq_desc_get_irq(ipi_desc[i]); seq_printf(p, "%*s%u: ", prec - 1, "IPI", i); for_each_online_cpu(cpu) Thanks, M.
On 24/09/2020 10:29, Marc Zyngier wrote: > Hi Guillaume, > > On Thu, 24 Sep 2020 10:00:09 +0100, > Guillaume Tucker <guillaume.tucker@collabora.com> wrote: >> >> Hi Marc, >> >> On 01/09/2020 15:43, Marc Zyngier wrote: >>> Let's switch the arm code to the core accounting, which already >>> does everything we need. >>> >>> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> >>> Signed-off-by: Marc Zyngier <maz@kernel.org> >>> --- >>> arch/arm/include/asm/hardirq.h | 17 ----------------- >>> arch/arm/kernel/smp.c | 20 ++++---------------- >>> 2 files changed, 4 insertions(+), 33 deletions(-) >> >> This appears to be causing a NULL pointer dereference on >> beaglebone-black, it got bisected automatically several times. >> None of the other platforms in the KernelCI labs appears to be >> affected. > > Hmm. My bet is that because this is a UP machine running an SMP > kernel, and I fell into the trap of forgetting about this 32bit > configuration. > > I expect the following patch to fix it. Please give it a go if you can > (I'm away at the moment and can't test much, and do not have any > physical 32bit machine to test this on). OK thanks, that worked: https://lava.baylibre.com/scheduler/job/143170 I've added this fix to the kernel branch used on staging.kernelci.org which is based on linux-next, so it will get fully verified a bit later today. Guillaume > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > index 00327fa74b01..b4e3d336dc33 100644 > --- a/arch/arm/kernel/smp.c > +++ b/arch/arm/kernel/smp.c > @@ -531,7 +531,12 @@ void show_ipi_list(struct seq_file *p, int prec) > unsigned int cpu, i; > > for (i = 0; i < NR_IPI; i++) { > - unsigned int irq = irq_desc_get_irq(ipi_desc[i]); > + unsigned int irq; > + > + if (!ipi_desc[i]) > + continue; > + > + irq = irq_desc_get_irq(ipi_desc[i]); > seq_printf(p, "%*s%u: ", prec - 1, "IPI", i); > > for_each_online_cpu(cpu) > > Thanks, > > M. >
Hi Guillaume, On Thu, Sep 24, 2020 at 6:01 AM Guillaume Tucker <guillaume.tucker@collabora.com> wrote: > This appears to be causing a NULL pointer dereference on > beaglebone-black, it got bisected automatically several times. > None of the other platforms in the KernelCI labs appears to be > affected. Actually imx53-qsb is also affected: https://storage.kernelci.org/next/master/next-20200924/arm/imx_v6_v7_defconfig/gcc-8/lab-pengutronix/baseline-imx53-qsrb.html kernelci marks it Boot result: PASS though. Shouldn't kernelci flag a warning or error instead? Thanks
On 24/09/2020 14:34, Fabio Estevam wrote: > Hi Guillaume, > > On Thu, Sep 24, 2020 at 6:01 AM Guillaume Tucker > <guillaume.tucker@collabora.com> wrote: > >> This appears to be causing a NULL pointer dereference on >> beaglebone-black, it got bisected automatically several times. >> None of the other platforms in the KernelCI labs appears to be >> affected. > > Actually imx53-qsb is also affected: > https://storage.kernelci.org/next/master/next-20200924/arm/imx_v6_v7_defconfig/gcc-8/lab-pengutronix/baseline-imx53-qsrb.html > > kernelci marks it Boot result: PASS though. > > Shouldn't kernelci flag a warning or error instead? Thanks for bringing this up. The status in the HTML log file is a very coarse one, in this case the board booted "fine" since it reached a login prompt. The issue was detected later when checking for errors in the kernel log. But yes you're right, the issue is also impacting imx53-qsrb indeed. I didn't spot that because it was only reported as a regression on staging.kernelci.org, whereas imx53-qsrb is in the Pengutronix lab which is not sending results there at the moment. The failures can be found on the production web dashboard though, but not as regressions: beaglebone-black: https://kernelci.org/test/case/id/5f6c7f1ab7c8c5472cbf9de9/ imx53-qsrb: https://kernelci.org/test/case/id/5f6c7ea6f89a9d0f4dbf9ddf/ I need to investigate why that is the case, knowing that the regression was detected correctly on staging which is the development KernelCI instance: https://staging.kernelci.org/test/plan/id/5f6bea67f724eb1b34dce581/ Thanks, Guillaume
Hi Marc, On 24/09/2020 14:09, Guillaume Tucker wrote: > On 24/09/2020 10:29, Marc Zyngier wrote: >> Hi Guillaume, >> >> On Thu, 24 Sep 2020 10:00:09 +0100, >> Guillaume Tucker <guillaume.tucker@collabora.com> wrote: >>> >>> Hi Marc, >>> >>> On 01/09/2020 15:43, Marc Zyngier wrote: >>>> Let's switch the arm code to the core accounting, which already >>>> does everything we need. >>>> >>>> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> >>>> Signed-off-by: Marc Zyngier <maz@kernel.org> >>>> --- >>>> arch/arm/include/asm/hardirq.h | 17 ----------------- >>>> arch/arm/kernel/smp.c | 20 ++++---------------- >>>> 2 files changed, 4 insertions(+), 33 deletions(-) >>> >>> This appears to be causing a NULL pointer dereference on >>> beaglebone-black, it got bisected automatically several times. >>> None of the other platforms in the KernelCI labs appears to be >>> affected. >> >> Hmm. My bet is that because this is a UP machine running an SMP >> kernel, and I fell into the trap of forgetting about this 32bit >> configuration. >> >> I expect the following patch to fix it. Please give it a go if you can >> (I'm away at the moment and can't test much, and do not have any >> physical 32bit machine to test this on). > > OK thanks, that worked: > > https://lava.baylibre.com/scheduler/job/143170 > > I've added this fix to the kernel branch used on > staging.kernelci.org which is based on linux-next, so it will get > fully verified a bit later today. > > Guillaume > > >> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c >> index 00327fa74b01..b4e3d336dc33 100644 >> --- a/arch/arm/kernel/smp.c >> +++ b/arch/arm/kernel/smp.c >> @@ -531,7 +531,12 @@ void show_ipi_list(struct seq_file *p, int prec) >> unsigned int cpu, i; >> >> for (i = 0; i < NR_IPI; i++) { >> - unsigned int irq = irq_desc_get_irq(ipi_desc[i]); >> + unsigned int irq; >> + >> + if (!ipi_desc[i]) >> + continue; >> + >> + irq = irq_desc_get_irq(ipi_desc[i]); >> seq_printf(p, "%*s%u: ", prec - 1, "IPI", i); >> >> for_each_online_cpu(cpu) This fix has been all tested now, with no visible side effects: https://staging.kernelci.org/test/job/kernelci/branch/staging.kernelci.org/kernel/staging-20200928.1/plan/baseline/ In the meantime, the same issue was detected (without the fix) and bisected on sun5i-a13-olinuxino-micro and landed on the same commit. A few more platforms are also impacted such as imx53-qsb as mentioned by Fabio. The commit is in your irqchip tree so I guess we should wait for you to apply the fix. If you do make a separate commit to fix the issue, please add: Reported-by: kernelci.org bot <bot@kernelci.org> and also: Tested-by: Guillaume Tucker <guillaume.tucker@collabora.com> Thanks, Guillaume
diff --git a/arch/arm/include/asm/hardirq.h b/arch/arm/include/asm/hardirq.h index 7a88f160b1fb..b95848ed2bc7 100644 --- a/arch/arm/include/asm/hardirq.h +++ b/arch/arm/include/asm/hardirq.h @@ -6,29 +6,12 @@ #include <linux/threads.h> #include <asm/irq.h> -/* number of IPIS _not_ including IPI_CPU_BACKTRACE */ -#define NR_IPI 7 - typedef struct { unsigned int __softirq_pending; -#ifdef CONFIG_SMP - unsigned int ipi_irqs[NR_IPI]; -#endif } ____cacheline_aligned irq_cpustat_t; #include <linux/irq_cpustat.h> /* Standard mappings for irq_cpustat_t above */ -#define __inc_irq_stat(cpu, member) __IRQ_STAT(cpu, member)++ -#define __get_irq_stat(cpu, member) __IRQ_STAT(cpu, member) - -#ifdef CONFIG_SMP -u64 smp_irq_stat_cpu(unsigned int cpu); -#else -#define smp_irq_stat_cpu(cpu) 0 -#endif - -#define arch_irq_stat_cpu smp_irq_stat_cpu - #define __ARCH_IRQ_EXIT_IRQS_DISABLED 1 #endif /* __ASM_HARDIRQ_H */ diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index d51e64955a26..aead847ac8b9 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -65,6 +65,7 @@ enum ipi_msg_type { IPI_CPU_STOP, IPI_IRQ_WORK, IPI_COMPLETION, + NR_IPI, /* * CPU_BACKTRACE is special and not included in NR_IPI * or tracable with trace_ipi_* @@ -529,27 +530,16 @@ void show_ipi_list(struct seq_file *p, int prec) unsigned int cpu, i; for (i = 0; i < NR_IPI; i++) { + unsigned int irq = irq_desc_get_irq(ipi_desc[i]); seq_printf(p, "%*s%u: ", prec - 1, "IPI", i); for_each_online_cpu(cpu) - seq_printf(p, "%10u ", - __get_irq_stat(cpu, ipi_irqs[i])); + seq_printf(p, "%10u ", kstat_irqs_cpu(irq, cpu)); seq_printf(p, " %s\n", ipi_types[i]); } } -u64 smp_irq_stat_cpu(unsigned int cpu) -{ - u64 sum = 0; - int i; - - for (i = 0; i < NR_IPI; i++) - sum += __get_irq_stat(cpu, ipi_irqs[i]); - - return sum; -} - void arch_send_call_function_ipi_mask(const struct cpumask *mask) { smp_cross_call(mask, IPI_CALL_FUNC); @@ -630,10 +620,8 @@ static void do_handle_IPI(int ipinr) { unsigned int cpu = smp_processor_id(); - if ((unsigned)ipinr < NR_IPI) { + if ((unsigned)ipinr < NR_IPI) trace_ipi_entry_rcuidle(ipi_types[ipinr]); - __inc_irq_stat(cpu, ipi_irqs[ipinr]); - } switch (ipinr) { case IPI_WAKEUP: