Message ID | 2148724.oQlEHos3jS@vostro.rjw.lan (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Rafael, On Fri, May 20, 2016 at 9:56 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > If I'm not mistaken, the patch below should allow irq_work to make forward > progress for you, so please check if it makes any difference. .... > Index: linux-pm/drivers/cpufreq/cpufreq_governor.c > =================================================================== > --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c > +++ linux-pm/drivers/cpufreq/cpufreq_governor.c > @@ -362,6 +362,7 @@ static struct policy_dbs_info *alloc_pol > mutex_init(&policy_dbs->timer_mutex); > atomic_set(&policy_dbs->work_count, 0); > init_irq_work(&policy_dbs->irq_work, dbs_irq_work); > + policy_dbs->irq_work.flags = IRQ_WORK_LAZY; > INIT_WORK(&policy_dbs->work, dbs_work_handler); > > /* Set policy_dbs for all CPUs, online+offline */ Running a 4.6 kernel with this patch applied allows me to reboot the system, thanks. 'echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor' works fine now.
On Fri, May 20, 2016 at 10:19 PM, Fabio Estevam <festevam@gmail.com> wrote: > Running a 4.6 kernel with this patch applied allows me to reboot the > system, thanks. > > 'echo performance > > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor' works fine now. and as a bonus point suspend/resume works fine now too :-)
On Sat, May 21, 2016 at 3:22 AM, Fabio Estevam <festevam@gmail.com> wrote: > On Fri, May 20, 2016 at 10:19 PM, Fabio Estevam <festevam@gmail.com> wrote: > >> Running a 4.6 kernel with this patch applied allows me to reboot the >> system, thanks. >> >> 'echo performance > >> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor' works fine now. > > and as a bonus point suspend/resume works fine now too :-) OK, thanks for the confirmation! The patch is just a workaround for a problem in the arch code showing up on your platform, though.
On Fri, May 20, 2016 at 10:31 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > The patch is just a workaround for a problem in the arch code showing > up on your platform, though. Not sure where the problem in the arch code is located though. Ping, Shawn, Sascha, any ideas? Thanks
On Sat, May 21, 2016 at 02:56:20AM +0200, Rafael J. Wysocki wrote: > The root of the problem seems to be arch_irq_work_raise() and specifically > the __smp_cross_call function that appears to have problems. We've been through this before. The bottom line is that on ARM, there is major wide-spread understanding that we want to be able to run a kernel compiled for SMP on uniprocessor hardware. This is something that's been going for years, and has worked fine for years. Now, someone has introduced this irq work stuff. Great. But they haven't considered that people want to be able to run a SMP kernel on UP hardware which _may_ have no hardware present which is capable of raising IPIs. I don't know what the situation is with the platform concerned here, I don't know whether it uses the GIC (presumably, because it doesn't NULL-pointer ref on calling smp_cross_call(), it is using the GIC.) If so, the GIC isn't delivering the IPI because the IPI hardware is missing. Now, whether we could detect whether the GIC is IPI capable, I've no idea, I don't have such a platform I could run tests on. People don't give me hardware anymore now that arm-soc is split off from core ARM maintanence - it all goes into Linaro build farms. So I'm powerless to help here. My attitude towards this is that it's a core kernel problem: the core kernel is assuming that it can raise IPIs even on non-SMP capable hardware. Much non-SMP hardware doesn't have that ability. While folk can try to push it into arch code, my feeling is that it really needs to have a generic solution, even if it's some generic solution that architectures in this situation can plug into.
On Mon, May 23, 2016 at 8:28 PM, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > On Sat, May 21, 2016 at 02:56:20AM +0200, Rafael J. Wysocki wrote: >> The root of the problem seems to be arch_irq_work_raise() and specifically >> the __smp_cross_call function that appears to have problems. > > We've been through this before. The bottom line is that on ARM, there > is major wide-spread understanding that we want to be able to run a > kernel compiled for SMP on uniprocessor hardware. > > This is something that's been going for years, and has worked fine for > years. > > Now, someone has introduced this irq work stuff. Great. But they > haven't considered that people want to be able to run a SMP kernel > on UP hardware which _may_ have no hardware present which is capable > of raising IPIs. > > I don't know what the situation is with the platform concerned here, > I don't know whether it uses the GIC (presumably, because it doesn't > NULL-pointer ref on calling smp_cross_call(), it is using the GIC.) > If so, the GIC isn't delivering the IPI because the IPI hardware is > missing. > > Now, whether we could detect whether the GIC is IPI capable, I've > no idea, I don't have such a platform I could run tests on. People > don't give me hardware anymore now that arm-soc is split off from > core ARM maintanence - it all goes into Linaro build farms. So I'm > powerless to help here. > > My attitude towards this is that it's a core kernel problem: the > core kernel is assuming that it can raise IPIs even on non-SMP > capable hardware. Much non-SMP hardware doesn't have that ability. > While folk can try to push it into arch code, my feeling is that it > really needs to have a generic solution, even if it's some generic > solution that architectures in this situation can plug into. Well, this particular case isn't about IPIs at all. irq_work_queue() can work without IPIs and it works like that on other ARM platforms where SMP kernels are run on UP hardware (without IPI support). Something seems to be missing in kernel config or similar here, but I can't really say what it is right away.
On Mon, May 23, 2016 at 10:16 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Mon, May 23, 2016 at 8:28 PM, Russell King - ARM Linux > <linux@armlinux.org.uk> wrote: >> On Sat, May 21, 2016 at 02:56:20AM +0200, Rafael J. Wysocki wrote: >>> The root of the problem seems to be arch_irq_work_raise() and specifically >>> the __smp_cross_call function that appears to have problems. >> >> We've been through this before. The bottom line is that on ARM, there >> is major wide-spread understanding that we want to be able to run a >> kernel compiled for SMP on uniprocessor hardware. >> >> This is something that's been going for years, and has worked fine for >> years. >> >> Now, someone has introduced this irq work stuff. Great. But they >> haven't considered that people want to be able to run a SMP kernel >> on UP hardware which _may_ have no hardware present which is capable >> of raising IPIs. >> >> I don't know what the situation is with the platform concerned here, >> I don't know whether it uses the GIC (presumably, because it doesn't >> NULL-pointer ref on calling smp_cross_call(), it is using the GIC.) >> If so, the GIC isn't delivering the IPI because the IPI hardware is >> missing. >> >> Now, whether we could detect whether the GIC is IPI capable, I've >> no idea, I don't have such a platform I could run tests on. People >> don't give me hardware anymore now that arm-soc is split off from >> core ARM maintanence - it all goes into Linaro build farms. So I'm >> powerless to help here. >> >> My attitude towards this is that it's a core kernel problem: the >> core kernel is assuming that it can raise IPIs even on non-SMP >> capable hardware. Much non-SMP hardware doesn't have that ability. >> While folk can try to push it into arch code, my feeling is that it >> really needs to have a generic solution, even if it's some generic >> solution that architectures in this situation can plug into. > > Well, this particular case isn't about IPIs at all. > > irq_work_queue() can work without IPIs and it works like that on other > ARM platforms where SMP kernels are run on UP hardware (without IPI > support). > > Something seems to be missing in kernel config or similar here, but I > can't really say what it is right away. For one, if that platform is not capable of raising interrupts for IRQ works, I'm not sure why arch_irq_work_has_interrupt() returns true on it.
On Mon, May 23, 2016 at 5:23 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > For one, if that platform is not capable of raising interrupts for IRQ > works, I'm not sure why arch_irq_work_has_interrupt() returns true on > it. It returns true because the kernel was built with CONFIG_SMP=y. On ARM we have: static inline bool arch_irq_work_has_interrupt(void) { return is_smp(); } and then: static inline bool is_smp(void) { #ifndef CONFIG_SMP return false; #elif defined(CONFIG_SMP_ON_UP) extern unsigned int smp_on_up; return !!smp_on_up; #else return true; #endif } So if CONFIG_SMP=y, is_smp() returns 1. As I mentioned before, the reboot problem does not happen with CONFIG_SMP=n.
On Mon, May 23, 2016 at 10:59 PM, Fabio Estevam <festevam@gmail.com> wrote: > On Mon, May 23, 2016 at 5:23 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > >> For one, if that platform is not capable of raising interrupts for IRQ >> works, I'm not sure why arch_irq_work_has_interrupt() returns true on >> it. > > It returns true because the kernel was built with CONFIG_SMP=y. > > On ARM we have: > > static inline bool arch_irq_work_has_interrupt(void) > { > return is_smp(); > } > > and then: > > static inline bool is_smp(void) > { > #ifndef CONFIG_SMP > return false; > #elif defined(CONFIG_SMP_ON_UP) > extern unsigned int smp_on_up; > return !!smp_on_up; > #else > return true; > #endif > } > > So if CONFIG_SMP=y, is_smp() returns 1. > > As I mentioned before, the reboot problem does not happen with CONFIG_SMP=n. First, why don't the other ARM UP platforms have this problem when SMP kernels are run on them? Second, quite evidently, the platform says "I can raise interrupts for IRQ works", but then it doesn't do that. That doesn't seem particularly consistent to me ...
On Mon, May 23, 2016 at 05:59:37PM -0300, Fabio Estevam wrote: > On Mon, May 23, 2016 at 5:23 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > > For one, if that platform is not capable of raising interrupts for IRQ > > works, I'm not sure why arch_irq_work_has_interrupt() returns true on > > it. > > It returns true because the kernel was built with CONFIG_SMP=y. > > On ARM we have: > > static inline bool arch_irq_work_has_interrupt(void) > { > return is_smp(); > } > > and then: > > static inline bool is_smp(void) > { > #ifndef CONFIG_SMP > return false; > #elif defined(CONFIG_SMP_ON_UP) > extern unsigned int smp_on_up; > return !!smp_on_up; > #else > return true; > #endif > } > > So if CONFIG_SMP=y, is_smp() returns 1. > > As I mentioned before, the reboot problem does not happen with CONFIG_SMP=n. Read the code again. There's three states: 1. CONFIG_SMP=n - it always returns false. 2. CONFIG_SMP=y, CONFIG_SMP_ON_UP=n - it always returns true. Such a kernel can _only_ be run on SMP capable systems. 3. CONFIG_SMP=y, CONFIG_SMP_ON_UP=y - it returns depending on whether we detect that the CPU is SMP capable.
On Mon, May 23, 2016 at 11:03:01PM +0200, Rafael J. Wysocki wrote: > On Mon, May 23, 2016 at 10:59 PM, Fabio Estevam <festevam@gmail.com> wrote: > > On ARM we have: > > > > static inline bool arch_irq_work_has_interrupt(void) > > { > > return is_smp(); > > } > > > > and then: > > > > static inline bool is_smp(void) > > { > > #ifndef CONFIG_SMP > > return false; > > #elif defined(CONFIG_SMP_ON_UP) > > extern unsigned int smp_on_up; > > return !!smp_on_up; > > #else > > return true; > > #endif > > } > > > > So if CONFIG_SMP=y, is_smp() returns 1. > > > > As I mentioned before, the reboot problem does not happen with CONFIG_SMP=n. > > First, why don't the other ARM UP platforms have this problem when SMP > kernels are run on them? > > Second, quite evidently, the platform says "I can raise interrupts for > IRQ works", but then it doesn't do that. That doesn't seem > particularly consistent to me ... Maybe someone has implemented a SoC which has a CPU capable of SMP but the GIC isn't... I'm just guessing again.
On Mon, May 23, 2016 at 6:07 PM, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > Read the code again. There's three states: > > 1. CONFIG_SMP=n - it always returns false. > 2. CONFIG_SMP=y, CONFIG_SMP_ON_UP=n - it always returns true. Such a > kernel can _only_ be run on SMP capable systems. > 3. CONFIG_SMP=y, CONFIG_SMP_ON_UP=y - it returns depending on whether > we detect that the CPU is SMP capable. Yes, I meant that on imx6ul we have option 2 above with imx_v6_v7_defconfig. Also tested option 3 and is_smp() returns true as well. I will try to see why GIC on this SoC cannot work in SMP.
On Mon, May 23, 2016 at 06:38:31PM -0300, Fabio Estevam wrote: > On Mon, May 23, 2016 at 6:07 PM, Russell King - ARM Linux > <linux@armlinux.org.uk> wrote: > > > Read the code again. There's three states: > > > > 1. CONFIG_SMP=n - it always returns false. > > 2. CONFIG_SMP=y, CONFIG_SMP_ON_UP=n - it always returns true. Such a > > kernel can _only_ be run on SMP capable systems. > > 3. CONFIG_SMP=y, CONFIG_SMP_ON_UP=y - it returns depending on whether > > we detect that the CPU is SMP capable. > > Yes, I meant that on imx6ul we have option 2 above with imx_v6_v7_defconfig. I don't think so, unless you specifically turn off CONFIG_SMP_ON_UP. It defaults to 'y', so imx_v6_v7_defconfig should result in this being 'y'. > Also tested option 3 and is_smp() returns true as well. So that means the MPIDR register is telling the kernel... * The boot CPU has multiprocessor extensions. * The boot CPU is not part of a uniprocessor system. > I will try to see why GIC on this SoC cannot work in SMP.
On Mon, May 23, 2016 at 7:05 PM, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > I don't think so, unless you specifically turn off CONFIG_SMP_ON_UP. > It defaults to 'y', so imx_v6_v7_defconfig should result in this being > 'y'. Sorry, you are right: option 3 (CONFIG_SMP=y and CONFIG_SMP_ON_UP=y) is the default one with imx_v6_v7_defconfig.
Index: linux-pm/drivers/cpufreq/cpufreq_governor.c =================================================================== --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c +++ linux-pm/drivers/cpufreq/cpufreq_governor.c @@ -362,6 +362,7 @@ static struct policy_dbs_info *alloc_pol mutex_init(&policy_dbs->timer_mutex); atomic_set(&policy_dbs->work_count, 0); init_irq_work(&policy_dbs->irq_work, dbs_irq_work); + policy_dbs->irq_work.flags = IRQ_WORK_LAZY; INIT_WORK(&policy_dbs->work, dbs_work_handler); /* Set policy_dbs for all CPUs, online+offline */