Message ID | 1374817287-27952-1-git-send-email-vijay.kilari@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
[Adding Stephen Warren since he has been working in this area] On Fri, Jul 26, 2013 at 06:41:27AM +0100, vijay.kilari@gmail.com wrote: > From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> > > In case of normal kexec kernel load, all cpu's are offlined > before calling machine_kexec() under kernel_kexec() function. > But in case crash panic cpus are relaxed in > machine_crash_nonpanic_core() SMP function but not offlined. > > When crash kernel is loaded with kexec and on panic trigger > machine_kexec() checks for number of cpus online. > If more than one cpu is online machine_kexec() fails to load > with below error > > kexec: error: multiple CPUs still online > > In machine_crash_nonpanic_core() SMP function, offline CPU > before cpu_relax > > Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> > --- > arch/arm/kernel/machine_kexec.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > index 4fb074c..163b160 100644 > --- a/arch/arm/kernel/machine_kexec.c > +++ b/arch/arm/kernel/machine_kexec.c > @@ -73,6 +73,7 @@ void machine_crash_nonpanic_core(void *unused) > crash_save_cpu(®s, smp_processor_id()); > flush_cache_all(); > > + set_cpu_online(smp_processor_id(), false); > atomic_dec(&waiting_for_crash_ipi); > while (1) > cpu_relax(); Ok, I guess this will work since the new kernel is loaded somewhere higher in memory and the crashed kernel will stick around, so the non-crashing CPUs can sit around spinning. Will
On 07/25/2013 11:41 PM, vijay.kilari@gmail.com wrote: > From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> > > In case of normal kexec kernel load, all cpu's are offlined > before calling machine_kexec() under kernel_kexec() function. I'm not sure that's true, unless perhaps you have CONFIG_KEXEC_JUMP enabled? > But in case crash panic cpus are relaxed in > machine_crash_nonpanic_core() SMP function but not offlined. > > When crash kernel is loaded with kexec and on panic trigger > machine_kexec() checks for number of cpus online. > If more than one cpu is online machine_kexec() fails to load > with below error > > kexec: error: multiple CPUs still online > > In machine_crash_nonpanic_core() SMP function, offline CPU > before cpu_relax > diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > @@ -73,6 +73,7 @@ void machine_crash_nonpanic_core(void *unused) > crash_save_cpu(®s, smp_processor_id()); > flush_cache_all(); > > + set_cpu_online(smp_processor_id(), false); I'm not familiar with that API, but it looks like it's just setting the *current* CPU offline. That sounds problematic for two reasons: 1) Setting the current CPU offline sounds like a bad idea; after all, code is still running on it. Presumably you want to offline all other CPUs. 2) On a dual-CPU system, I guess this will leave a single CPU marked online, and hence satisfy the test in machine_kexec(). However, on a quad-core system, won't this just reduce the online CPU count from 4 to 3 and hence the test in machine_kexec() will still fail? Can't you call disable_nonboot_cpus() from machine_crash_nonpanic_core() just like machine_shutdown() does?
On 07/26/2013 04:49 AM, Will Deacon wrote: > [Adding Stephen Warren since he has been working in this area] > > On Fri, Jul 26, 2013 at 06:41:27AM +0100, vijay.kilari@gmail.com wrote: >> From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> >> >> In case of normal kexec kernel load, all cpu's are offlined >> before calling machine_kexec() under kernel_kexec() function. >> But in case crash panic cpus are relaxed in >> machine_crash_nonpanic_core() SMP function but not offlined. >> >> When crash kernel is loaded with kexec and on panic trigger >> machine_kexec() checks for number of cpus online. >> If more than one cpu is online machine_kexec() fails to load >> with below error >> >> kexec: error: multiple CPUs still online >> >> In machine_crash_nonpanic_core() SMP function, offline CPU >> before cpu_relax >> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c >> @@ -73,6 +73,7 @@ void machine_crash_nonpanic_core(void *unused) >> crash_save_cpu(®s, smp_processor_id()); >> flush_cache_all(); >> >> + set_cpu_online(smp_processor_id(), false); >> atomic_dec(&waiting_for_crash_ipi); >> while (1) >> cpu_relax(); > > Ok, I guess this will work since the new kernel is loaded somewhere higher > in memory and the crashed kernel will stick around, so the non-crashing CPUs > can sit around spinning. Does a kernel that's used as the crash kernel guarantee: * Never to re-use the memory that was used by the previous kernel, so that the spin loop code/data won't be corrupted, ever, no matter how long the crash recovery kernel runs. * Not use SMP, so there's never a need to re-activate the non-boot CPUs, which might not work if they aren't truly disabled but rather just running a pin loop?
On Fri, Jul 26, 2013 at 06:08:07PM +0100, Stephen Warren wrote: > On 07/26/2013 04:49 AM, Will Deacon wrote: > > [Adding Stephen Warren since he has been working in this area] > > > > On Fri, Jul 26, 2013 at 06:41:27AM +0100, vijay.kilari@gmail.com wrote: > >> From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> > >> > >> In case of normal kexec kernel load, all cpu's are offlined > >> before calling machine_kexec() under kernel_kexec() function. > >> But in case crash panic cpus are relaxed in > >> machine_crash_nonpanic_core() SMP function but not offlined. > >> > >> When crash kernel is loaded with kexec and on panic trigger > >> machine_kexec() checks for number of cpus online. > >> If more than one cpu is online machine_kexec() fails to load > >> with below error > >> > >> kexec: error: multiple CPUs still online > >> > >> In machine_crash_nonpanic_core() SMP function, offline CPU > >> before cpu_relax > > >> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > > >> @@ -73,6 +73,7 @@ void machine_crash_nonpanic_core(void *unused) > >> crash_save_cpu(®s, smp_processor_id()); > >> flush_cache_all(); > >> > >> + set_cpu_online(smp_processor_id(), false); > >> atomic_dec(&waiting_for_crash_ipi); > >> while (1) > >> cpu_relax(); > > > > Ok, I guess this will work since the new kernel is loaded somewhere higher > > in memory and the crashed kernel will stick around, so the non-crashing CPUs > > can sit around spinning. > > Does a kernel that's used as the crash kernel guarantee: > > * Never to re-use the memory that was used by the previous kernel, so > that the spin loop code/data won't be corrupted, ever, no matter how > long the crash recovery kernel runs. > > * Not use SMP, so there's never a need to re-activate the non-boot CPUs, > which might not work if they aren't truly disabled but rather just > running a pin loop? I *think* this is true, and x86 seems to have code to a similar effect (the powerpc stuff lost me though). I've never played with crash kernels on SMP though... Will
On Fri, Jul 26, 2013 at 10:35 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > On 07/25/2013 11:41 PM, vijay.kilari@gmail.com wrote: >> From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> >> >> In case of normal kexec kernel load, all cpu's are offlined >> before calling machine_kexec() under kernel_kexec() function. > > I'm not sure that's true, unless perhaps you have CONFIG_KEXEC_JUMP enabled? > >> But in case crash panic cpus are relaxed in >> machine_crash_nonpanic_core() SMP function but not offlined. >> >> When crash kernel is loaded with kexec and on panic trigger >> machine_kexec() checks for number of cpus online. >> If more than one cpu is online machine_kexec() fails to load >> with below error >> >> kexec: error: multiple CPUs still online >> >> In machine_crash_nonpanic_core() SMP function, offline CPU >> before cpu_relax > >> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > >> @@ -73,6 +73,7 @@ void machine_crash_nonpanic_core(void *unused) >> crash_save_cpu(®s, smp_processor_id()); >> flush_cache_all(); >> >> + set_cpu_online(smp_processor_id(), false); > > I'm not familiar with that API, but it looks like it's just setting the > *current* CPU offline. That sounds problematic for two reasons: > > 1) Setting the current CPU offline sounds like a bad idea; after all, > code is still running on it. Presumably you want to offline all other CPUs. > machine_crash_nonpanic_core() is a SMP call (smp_call_function) . Setting cpu offline is called for all other CPUs except the caller. > 2) On a dual-CPU system, I guess this will leave a single CPU marked > online, and hence satisfy the test in machine_kexec(). However, on a > quad-core system, won't this just reduce the online CPU count from 4 to > 3 and hence the test in machine_kexec() will still fail? > Setting CPU offline is called from SMP call function. So it is called for all the CPU's on the system except on caller CPU > Can't you call disable_nonboot_cpus() from machine_crash_nonpanic_core() > just like machine_shutdown() does? I thought of using disable_nonboot_cpus(). However crash can happen on any CPU. So we have to stop only nonpanic CPUs. The other mechanisms I thought to offline CPUs is 1) Calling __cpu_disable() to put CPU completely offline. However platform_cpu_disable() does not allow CPU 0 is disable (crash can happen on any core). 2) Calling machine_halt(). This does not allow smp_send_stop() on bootable cpu
On Fri, Jul 26, 2013 at 10:38 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > On 07/26/2013 04:49 AM, Will Deacon wrote: >> [Adding Stephen Warren since he has been working in this area] >> >> On Fri, Jul 26, 2013 at 06:41:27AM +0100, vijay.kilari@gmail.com wrote: >>> From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> >>> >>> In case of normal kexec kernel load, all cpu's are offlined >>> before calling machine_kexec() under kernel_kexec() function. >>> But in case crash panic cpus are relaxed in >>> machine_crash_nonpanic_core() SMP function but not offlined. >>> >>> When crash kernel is loaded with kexec and on panic trigger >>> machine_kexec() checks for number of cpus online. >>> If more than one cpu is online machine_kexec() fails to load >>> with below error >>> >>> kexec: error: multiple CPUs still online >>> >>> In machine_crash_nonpanic_core() SMP function, offline CPU >>> before cpu_relax > >>> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > >>> @@ -73,6 +73,7 @@ void machine_crash_nonpanic_core(void *unused) >>> crash_save_cpu(®s, smp_processor_id()); >>> flush_cache_all(); >>> >>> + set_cpu_online(smp_processor_id(), false); >>> atomic_dec(&waiting_for_crash_ipi); >>> while (1) >>> cpu_relax(); >> >> Ok, I guess this will work since the new kernel is loaded somewhere higher >> in memory and the crashed kernel will stick around, so the non-crashing CPUs >> can sit around spinning. > > Does a kernel that's used as the crash kernel guarantee: > > * Never to re-use the memory that was used by the previous kernel, so > that the spin loop code/data won't be corrupted, ever, no matter how > long the crash recovery kernel runs. > > * Not use SMP, so there's never a need to re-activate the non-boot CPUs, > which might not work if they aren't truly disabled but rather just > running a pin loop? From cat /proc/iomem, normal kernel is executed from (0x80xxxxxx) with crash kernel reserved 64M@0xa0000000 80000000-bfffffff : System RAM 80008000-805aeddf : Kernel code 805e2000-8063e427 : Kernel data a0000000-a3ffffff : Crash kernel crash kernel is loaded to reserved memory location and is executed from there. I could confirm this from /proc/iomem when crash kernel is running a0000000-a3efffff : System RAM a0008000-a05aeddf : Kernel code a05e2000-a063e427 : Kernel data
On 07/30/2013 04:05 AM, Vijay Kilari wrote: > On Fri, Jul 26, 2013 at 10:35 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >> On 07/25/2013 11:41 PM, vijay.kilari@gmail.com wrote: >>> From: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com> >>> >>> In case of normal kexec kernel load, all cpu's are offlined >>> before calling machine_kexec() under kernel_kexec() function. >> >> I'm not sure that's true, unless perhaps you have CONFIG_KEXEC_JUMP enabled? >> >>> But in case crash panic cpus are relaxed in >>> machine_crash_nonpanic_core() SMP function but not offlined. >>> >>> When crash kernel is loaded with kexec and on panic trigger >>> machine_kexec() checks for number of cpus online. >>> If more than one cpu is online machine_kexec() fails to load >>> with below error >>> >>> kexec: error: multiple CPUs still online >>> >>> In machine_crash_nonpanic_core() SMP function, offline CPU >>> before cpu_relax >> >>> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c >> >>> @@ -73,6 +73,7 @@ void machine_crash_nonpanic_core(void *unused) >>> crash_save_cpu(®s, smp_processor_id()); >>> flush_cache_all(); >>> >>> + set_cpu_online(smp_processor_id(), false); >> >> I'm not familiar with that API, but it looks like it's just setting the >> *current* CPU offline. That sounds problematic for two reasons: >> >> 1) Setting the current CPU offline sounds like a bad idea; after all, >> code is still running on it. Presumably you want to offline all other CPUs. >> > machine_crash_nonpanic_core() is a SMP call (smp_call_function) . > Setting cpu offline is called for all other CPUs except the caller. Ah OK, that's what I was missing. This makes sense then.
On 07/30/2013 04:37 AM, Vijay Kilari wrote: > On Fri, Jul 26, 2013 at 10:38 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: ... >> Does a kernel that's used as the crash kernel guarantee: >> >> * Never to re-use the memory that was used by the previous kernel, so >> that the spin loop code/data won't be corrupted, ever, no matter how >> long the crash recovery kernel runs. >> >> * Not use SMP, so there's never a need to re-activate the non-boot CPUs, >> which might not work if they aren't truly disabled but rather just >> running a pin loop? > > From cat /proc/iomem, normal kernel is executed from (0x80xxxxxx) with crash > kernel reserved 64M@0xa0000000 > > 80000000-bfffffff : System RAM > 80008000-805aeddf : Kernel code > 805e2000-8063e427 : Kernel data > a0000000-a3ffffff : Crash kernel > > crash kernel is loaded to reserved memory location and is executed from there. > I could confirm this from /proc/iomem when crash kernel is running > > a0000000-a3efffff : System RAM > a0008000-a05aeddf : Kernel code > a05e2000-a063e427 : Kernel data OK, but in the crash dump kernel, is 80008000..8063e427 reserved as well, which would guarantee that the spin loop being executed by the non-crash CPUs won't be corrupted?
On Tue, Jul 30, 2013 at 10:29 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > On 07/30/2013 04:37 AM, Vijay Kilari wrote: >> On Fri, Jul 26, 2013 at 10:38 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > ... >>> Does a kernel that's used as the crash kernel guarantee: >>> >>> * Never to re-use the memory that was used by the previous kernel, so >>> that the spin loop code/data won't be corrupted, ever, no matter how >>> long the crash recovery kernel runs. >>> >>> * Not use SMP, so there's never a need to re-activate the non-boot CPUs, >>> which might not work if they aren't truly disabled but rather just >>> running a pin loop? >> >> From cat /proc/iomem, normal kernel is executed from (0x80xxxxxx) with crash >> kernel reserved 64M@0xa0000000 >> >> 80000000-bfffffff : System RAM >> 80008000-805aeddf : Kernel code >> 805e2000-8063e427 : Kernel data >> a0000000-a3ffffff : Crash kernel >> >> crash kernel is loaded to reserved memory location and is executed from there. >> I could confirm this from /proc/iomem when crash kernel is running >> >> a0000000-a3efffff : System RAM >> a0008000-a05aeddf : Kernel code >> a05e2000-a063e427 : Kernel data > > OK, but in the crash dump kernel, is 80008000..8063e427 reserved as > well, which would guarantee that the spin loop being executed by the > non-crash CPUs won't be corrupted? The crash dump kernel runs from reserved memory area (0xa0000000 - 0xa3effffff). So it should not corrupt the memory area of original kernel that was running at 0x80000000,where other CPU's are in spin loop.
On 07/31/2013 05:37 AM, Vijay Kilari wrote: > On Tue, Jul 30, 2013 at 10:29 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >> On 07/30/2013 04:37 AM, Vijay Kilari wrote: >>> On Fri, Jul 26, 2013 at 10:38 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >> ... >>>> Does a kernel that's used as the crash kernel guarantee: >>>> >>>> * Never to re-use the memory that was used by the previous kernel, so >>>> that the spin loop code/data won't be corrupted, ever, no matter how >>>> long the crash recovery kernel runs. >>>> >>>> * Not use SMP, so there's never a need to re-activate the non-boot CPUs, >>>> which might not work if they aren't truly disabled but rather just >>>> running a pin loop? >>> >>> From cat /proc/iomem, normal kernel is executed from (0x80xxxxxx) with crash >>> kernel reserved 64M@0xa0000000 >>> >>> 80000000-bfffffff : System RAM >>> 80008000-805aeddf : Kernel code >>> 805e2000-8063e427 : Kernel data >>> a0000000-a3ffffff : Crash kernel >>> >>> crash kernel is loaded to reserved memory location and is executed from there. >>> I could confirm this from /proc/iomem when crash kernel is running >>> >>> a0000000-a3efffff : System RAM >>> a0008000-a05aeddf : Kernel code >>> a05e2000-a063e427 : Kernel data >> >> OK, but in the crash dump kernel, is 80008000..8063e427 reserved as >> well, which would guarantee that the spin loop being executed by the >> non-crash CPUs won't be corrupted? > > The crash dump kernel runs from reserved memory area (0xa0000000 - 0xa3effffff). > So it should not corrupt the memory area of original kernel that was running > at 0x80000000,where other CPU's are in spin loop. What about dynamic allocations?
On Wed, Jul 31, 2013 at 10:44 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > On 07/31/2013 05:37 AM, Vijay Kilari wrote: >> On Tue, Jul 30, 2013 at 10:29 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >>> On 07/30/2013 04:37 AM, Vijay Kilari wrote: >>>> On Fri, Jul 26, 2013 at 10:38 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >>> ... >>>>> Does a kernel that's used as the crash kernel guarantee: >>>>> >>>>> * Never to re-use the memory that was used by the previous kernel, so >>>>> that the spin loop code/data won't be corrupted, ever, no matter how >>>>> long the crash recovery kernel runs. >>>>> >>>>> * Not use SMP, so there's never a need to re-activate the non-boot CPUs, >>>>> which might not work if they aren't truly disabled but rather just >>>>> running a pin loop? >>>> >>>> From cat /proc/iomem, normal kernel is executed from (0x80xxxxxx) with crash >>>> kernel reserved 64M@0xa0000000 >>>> >>>> 80000000-bfffffff : System RAM >>>> 80008000-805aeddf : Kernel code >>>> 805e2000-8063e427 : Kernel data >>>> a0000000-a3ffffff : Crash kernel >>>> >>>> crash kernel is loaded to reserved memory location and is executed from there. >>>> I could confirm this from /proc/iomem when crash kernel is running >>>> >>>> a0000000-a3efffff : System RAM >>>> a0008000-a05aeddf : Kernel code >>>> a05e2000-a063e427 : Kernel data >>> >>> OK, but in the crash dump kernel, is 80008000..8063e427 reserved as >>> well, which would guarantee that the spin loop being executed by the >>> non-crash CPUs won't be corrupted? >> >> The crash dump kernel runs from reserved memory area (0xa0000000 - 0xa3effffff). >> So it should not corrupt the memory area of original kernel that was running >> at 0x80000000,where other CPU's are in spin loop. > > What about dynamic allocations? > IMHO, it is the kdump functionality to ensure that it won't corrupt original kernel's dynamic allocations
On 08/01/2013 07:49 AM, Vijay Kilari wrote: > On Wed, Jul 31, 2013 at 10:44 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >> On 07/31/2013 05:37 AM, Vijay Kilari wrote: >>> On Tue, Jul 30, 2013 at 10:29 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >>>> On 07/30/2013 04:37 AM, Vijay Kilari wrote: >>>>> On Fri, Jul 26, 2013 at 10:38 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >>>> ... >>>>>> Does a kernel that's used as the crash kernel guarantee: >>>>>> >>>>>> * Never to re-use the memory that was used by the previous kernel, so >>>>>> that the spin loop code/data won't be corrupted, ever, no matter how >>>>>> long the crash recovery kernel runs. >>>>>> >>>>>> * Not use SMP, so there's never a need to re-activate the non-boot CPUs, >>>>>> which might not work if they aren't truly disabled but rather just >>>>>> running a pin loop? >>>>> >>>>> From cat /proc/iomem, normal kernel is executed from (0x80xxxxxx) with crash >>>>> kernel reserved 64M@0xa0000000 >>>>> >>>>> 80000000-bfffffff : System RAM >>>>> 80008000-805aeddf : Kernel code >>>>> 805e2000-8063e427 : Kernel data >>>>> a0000000-a3ffffff : Crash kernel >>>>> >>>>> crash kernel is loaded to reserved memory location and is executed from there. >>>>> I could confirm this from /proc/iomem when crash kernel is running >>>>> >>>>> a0000000-a3efffff : System RAM >>>>> a0008000-a05aeddf : Kernel code >>>>> a05e2000-a063e427 : Kernel data >>>> >>>> OK, but in the crash dump kernel, is 80008000..8063e427 reserved as >>>> well, which would guarantee that the spin loop being executed by the >>>> non-crash CPUs won't be corrupted? >>> >>> The crash dump kernel runs from reserved memory area (0xa0000000 - 0xa3effffff). >>> So it should not corrupt the memory area of original kernel that was running >>> at 0x80000000,where other CPU's are in spin loop. >> >> What about dynamic allocations? > > IMHO, it is the kdump functionality to ensure that it won't corrupt > original kernel's dynamic allocations OK, if there are explicit measure to assure this already, then there's no issue.
On Thu, Aug 1, 2013 at 9:55 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > On 08/01/2013 07:49 AM, Vijay Kilari wrote: >> On Wed, Jul 31, 2013 at 10:44 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >>> On 07/31/2013 05:37 AM, Vijay Kilari wrote: >>>> On Tue, Jul 30, 2013 at 10:29 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >>>>> On 07/30/2013 04:37 AM, Vijay Kilari wrote: >>>>>> On Fri, Jul 26, 2013 at 10:38 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: >>>>> ... >>>>>>> Does a kernel that's used as the crash kernel guarantee: >>>>>>> >>>>>>> * Never to re-use the memory that was used by the previous kernel, so >>>>>>> that the spin loop code/data won't be corrupted, ever, no matter how >>>>>>> long the crash recovery kernel runs. >>>>>>> >>>>>>> * Not use SMP, so there's never a need to re-activate the non-boot CPUs, >>>>>>> which might not work if they aren't truly disabled but rather just >>>>>>> running a pin loop? >>>>>> >>>>>> From cat /proc/iomem, normal kernel is executed from (0x80xxxxxx) with crash >>>>>> kernel reserved 64M@0xa0000000 >>>>>> >>>>>> 80000000-bfffffff : System RAM >>>>>> 80008000-805aeddf : Kernel code >>>>>> 805e2000-8063e427 : Kernel data >>>>>> a0000000-a3ffffff : Crash kernel >>>>>> >>>>>> crash kernel is loaded to reserved memory location and is executed from there. >>>>>> I could confirm this from /proc/iomem when crash kernel is running >>>>>> >>>>>> a0000000-a3efffff : System RAM >>>>>> a0008000-a05aeddf : Kernel code >>>>>> a05e2000-a063e427 : Kernel data >>>>> >>>>> OK, but in the crash dump kernel, is 80008000..8063e427 reserved as >>>>> well, which would guarantee that the spin loop being executed by the >>>>> non-crash CPUs won't be corrupted? >>>> >>>> The crash dump kernel runs from reserved memory area (0xa0000000 - 0xa3effffff). >>>> So it should not corrupt the memory area of original kernel that was running >>>> at 0x80000000,where other CPU's are in spin loop. >>> >>> What about dynamic allocations? >> >> IMHO, it is the kdump functionality to ensure that it won't corrupt >> original kernel's dynamic allocations > > OK, if there are explicit measure to assure this already, then there's > no issue. Hi Will, Can you please consider this patch? Thanks & Regards Vijay
On Mon, Aug 12, 2013 at 01:18:38PM +0100, Vijay Kilari wrote: > On Thu, Aug 1, 2013 at 9:55 PM, Stephen Warren <swarren@wwwdotorg.org> wrote: > > OK, if there are explicit measure to assure this already, then there's > > no issue. > > Hi Will, > > Can you please consider this patch? Assuming that Stephen and I are understanding things correctly, then this patch seems fine. Can you put it into Russell's patch system please? Will
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index 4fb074c..163b160 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -73,6 +73,7 @@ void machine_crash_nonpanic_core(void *unused) crash_save_cpu(®s, smp_processor_id()); flush_cache_all(); + set_cpu_online(smp_processor_id(), false); atomic_dec(&waiting_for_crash_ipi); while (1) cpu_relax();