Message ID | 20190708211528.12392-1-pasha.tatashin@soleen.com (mailing list archive) |
---|---|
Headers | show |
Series | allow to reserve memory for normal kexec kernel | expand |
Pavel Tatashin <pasha.tatashin@soleen.com> writes: > Currently, it is only allowed to reserve memory for crash kernel, because > it is a requirement in order to be able to boot into crash kernel without > touching memory of crashed kernel is to have memory reserved. > > The second benefit for having memory reserved for kexec kernel is > that it does not require a relocation after segments are loaded into > memory. > > If kexec functionality is used for a fast system update, with a minimal > downtime, the relocation of kernel + initramfs might take a significant > portion of reboot. > > In fact, on the machine that we are using, that has ARM64 processor > it takes 0.35s to relocate during kexec, thus taking 52% of kernel reboot > time: > > kernel shutdown 0.03s > relocation 0.35s > kernel startup 0.29s > > Image: 13M and initramfs is 24M. If initramfs increases, the relocation > time increases proportionally. Something is very very wrong there. Last I measured memory bandwidth seriously I could touch a Gigabyte per second easily, and that was nearly 20 years ago. Did you manage to disable caching or have some particularly slow code that does the reolocations. There is a serious cost to reserving memory in that it is simply not available at other times. For kexec on panic there is no other reliable way to get memory that won't be DMA'd to. We have options in this case and I would strongly encourage you to track down why that copy in relocation is so very slow. I suspect a 4KiB page size is large enough that it can swamp pointer following costs. My back of the napkin math says even 20 years ago your copying costs should be only 0.037s. The only machine I have ever tested on where the copy costs were noticable was my old 386. Maybe I am out to lunch here but a claim that your memory only runs at 100MiB/s (the speed of my spinning rust hard drive) is rather incredible. Eric
> Something is very very wrong there. > > Last I measured memory bandwidth seriously I could touch a Gigabyte per > second easily, and that was nearly 20 years ago. Did you manage to > disable caching or have some particularly slow code that does the > reolocations. > > There is a serious cost to reserving memory in that it is simply not > available at other times. For kexec on panic there is no other reliable > way to get memory that won't be DMA'd to. Hi Eric, Thank you for your comments. Indeed, but sometimes fast reboot is more important than the cost of reserving 32M-64M of memory. > > We have options in this case and I would strongly encourage you to track > down why that copy in relocation is so very slow. I suspect a 4KiB page > size is large enough that it can swamp pointer following costs. > > My back of the napkin math says even 20 years ago your copying costs > should be only 0.037s. The only machine I have ever tested on where > the copy costs were noticable was my old 386. > > Maybe I am out to lunch here but a claim that your memory only runs > at 100MiB/s (the speed of my spinning rust hard drive) is rather > incredible. I agree, my measurement on this machine was 2,857MB/s. Perhaps when MMU is disabled ARM64 also has caching disabled? The function that loops through array of pages and relocates them to final destination is this: https://soleen.com/source/xref/linux/arch/arm64/kernel/relocate_kernel.S?r=d2912cb1#29 A comment before calling it: 205 /* 206 * cpu_soft_restart will shutdown the MMU, disable data caches, then 207 * transfer control to the reboot_code_buffer which contains a copy of 208 * the arm64_relocate_new_kernel routine. arm64_relocate_new_kernel 209 * uses physical addressing to relocate the new image to its final 210 * position and transfers control to the image entry point when the 211 * relocation is complete. 212 * In kexec case, kimage->start points to purgatory assuming that 213 * kernel entry and dtb address are embedded in purgatory by 214 * userspace (kexec-tools). 215 * In kexec_file case, the kernel starts directly without purgatory. 216 */ https://soleen.com/source/xref/linux/arch/arm64/kernel/machine_kexec.c?r=d2912cb1#206 So, as I understand at least data caches are disabled, and MMU is disabled, perhaps this is why this function is so incredibly slow? Perhaps, there is a better way to fix this problem by keeping caches enabled while still relocating? Any suggestions from Aarch64 developers? Pasha
Hi Pavel, Eric, (Subject-Nit: 'arm64:' is needed to match the style for arm64's arch code. Without it the maintainer is likely to skip the patches as being for core code.) On 09/07/2019 01:09, Pavel Tatashin wrote: >> Something is very very wrong there. >> >> Last I measured memory bandwidth seriously I could touch a Gigabyte per >> second easily, and that was nearly 20 years ago. Did you manage to >> disable caching or have some particularly slow code that does the >> reolocations. >> >> There is a serious cost to reserving memory in that it is simply not >> available at other times. For kexec on panic there is no other reliable >> way to get memory that won't be DMA'd to. > Indeed, but sometimes fast reboot is more important than the cost of > reserving 32M-64M of memory. >> We have options in this case and I would strongly encourage you to track >> down why that copy in relocation is so very slow. I suspect a 4KiB page >> size is large enough that it can swamp pointer following costs. >> >> My back of the napkin math says even 20 years ago your copying costs >> should be only 0.037s. The only machine I have ever tested on where >> the copy costs were noticable was my old 386. >> Maybe I am out to lunch here but a claim that your memory only runs >> at 100MiB/s (the speed of my spinning rust hard drive) is rather >> incredible. > I agree, my measurement on this machine was 2,857MB/s. Perhaps when > MMU is disabled ARM64 also has caching disabled? The function that > loops through array of pages and relocates them to final destination > is this: > A comment before calling it: > > 205 /* > 206 * cpu_soft_restart will shutdown the MMU, disable data caches, then > 207 * transfer control to the reboot_code_buffer which contains a copy of > 208 * the arm64_relocate_new_kernel routine. arm64_relocate_new_kernel > 209 * uses physical addressing to relocate the new image to its final > 210 * position and transfers control to the image entry point when the > 211 * relocation is complete. > 212 * In kexec case, kimage->start points to purgatory assuming that > 213 * kernel entry and dtb address are embedded in purgatory by > 214 * userspace (kexec-tools). > 215 * In kexec_file case, the kernel starts directly without purgatory. > 216 */ > So, as I understand at least data caches are disabled, and MMU is > disabled, perhaps this is why this function is so incredibly slow? Yup, spot on. Kexec typically wants to place the new kernel over the top of the old one, so its guaranteed to overwrite the live swapper_pg_dir. There is also nothing to prevent the other parts of the page-tables being overwritten as we relocate the kernel. The way the the kexec series chose to make this safe was the simplest: turn the MMU off. We need to enter purgatory with the MMU off anyway. (Its worth checking your kexec-tools purgatory isn't spending a decade generating a SHA256 of the kernel while the MMU is off. This is pointless as we don't suspect the previous kernel of corrupting memory, and we can't debug/report the problem if we detect a different SHA256. Newer kexec-tools have some commandline option to turn this thing off.) > Perhaps, there is a better way to fix this problem by keeping caches > enabled while still relocating? Any suggestions from Aarch64 > developers? Turning the MMU off is the simplest. The alternative is a lot more complicated: (To get the benefit of the caches, we need the MMU enabled to tell the hardware what the cache-ability attributes of each page of memory are.) We'd need to copy the page tables to build a new set out of memory we know won't get overwritten. Switching to this 'safe set' is tricky, as it also maps the code we're executing. To do that we'd need to use TTBR0 to hold another 'safe mapping' of the code we're running, while we change our view of the linear-map. Hibernate does exactly this, so its possible to re-use some of that logic. From memory, I think the reason that didn't get done is kexec doesn't provide an allocator, and needs the MMU off at some point anyway. Thanks, James
Hi Pavel, On Tue, Jul 9, 2019 at 2:46 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > > Currently, it is only allowed to reserve memory for crash kernel, because > it is a requirement in order to be able to boot into crash kernel without > touching memory of crashed kernel is to have memory reserved. > > The second benefit for having memory reserved for kexec kernel is > that it does not require a relocation after segments are loaded into > memory. > > If kexec functionality is used for a fast system update, with a minimal > downtime, the relocation of kernel + initramfs might take a significant > portion of reboot. > > In fact, on the machine that we are using, that has ARM64 processor > it takes 0.35s to relocate during kexec, thus taking 52% of kernel reboot > time: > > kernel shutdown 0.03s > relocation 0.35s > kernel startup 0.29s > > Image: 13M and initramfs is 24M. If initramfs increases, the relocation > time increases proportionally. > > While, it is possible to add 'kexeckernel=' parameters support to other > architectures by modifying reserve_crashkernel(), in this series this is > done for arm64 only. > > Pavel Tatashin (5): > kexec: quiet down kexec reboot > kexec: add resource for normal kexec region > kexec: export common crashkernel/kexeckernel parser > kexec: use reserved memory for normal kexec reboot > arm64, kexec: reserve kexeckernel region > > .../admin-guide/kernel-parameters.txt | 7 ++ > arch/arm64/kernel/setup.c | 5 ++ > arch/arm64/mm/init.c | 83 ++++++++++++------- > include/linux/crash_core.h | 6 ++ > include/linux/ioport.h | 1 + > include/linux/kexec.h | 6 +- > kernel/crash_core.c | 27 +++--- > kernel/kexec_core.c | 50 +++++++---- > 8 files changed, 127 insertions(+), 58 deletions(-) > > -- > 2.22.0 This seems like an issue with time spent while doing sha256 verification while in purgatory. Can you please try the following two patches which enable D-cache in purgatory before SHA verification and disable it before switching to kernel: http://lists.infradead.org/pipermail/kexec/2017-May/018839.html http://lists.infradead.org/pipermail/kexec/2017-May/018840.html Note that these were not accepted upstream but are included in several distros in some form or the other :) Thanks, Bhupesh
On Tue, Jul 9, 2019 at 6:36 AM Bhupesh Sharma <bhsharma@redhat.com> wrote: > > Hi Pavel, > > On Tue, Jul 9, 2019 at 2:46 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > > > > Currently, it is only allowed to reserve memory for crash kernel, because > > it is a requirement in order to be able to boot into crash kernel without > > touching memory of crashed kernel is to have memory reserved. > > > > The second benefit for having memory reserved for kexec kernel is > > that it does not require a relocation after segments are loaded into > > memory. > > > > If kexec functionality is used for a fast system update, with a minimal > > downtime, the relocation of kernel + initramfs might take a significant > > portion of reboot. > > > > In fact, on the machine that we are using, that has ARM64 processor > > it takes 0.35s to relocate during kexec, thus taking 52% of kernel reboot > > time: > > > > kernel shutdown 0.03s > > relocation 0.35s > > kernel startup 0.29s > > > > Image: 13M and initramfs is 24M. If initramfs increases, the relocation > > time increases proportionally. > > > > While, it is possible to add 'kexeckernel=' parameters support to other > > architectures by modifying reserve_crashkernel(), in this series this is > > done for arm64 only. > > > > Pavel Tatashin (5): > > kexec: quiet down kexec reboot > > kexec: add resource for normal kexec region > > kexec: export common crashkernel/kexeckernel parser > > kexec: use reserved memory for normal kexec reboot > > arm64, kexec: reserve kexeckernel region > > > > .../admin-guide/kernel-parameters.txt | 7 ++ > > arch/arm64/kernel/setup.c | 5 ++ > > arch/arm64/mm/init.c | 83 ++++++++++++------- > > include/linux/crash_core.h | 6 ++ > > include/linux/ioport.h | 1 + > > include/linux/kexec.h | 6 +- > > kernel/crash_core.c | 27 +++--- > > kernel/kexec_core.c | 50 +++++++---- > > 8 files changed, 127 insertions(+), 58 deletions(-) > > > > -- > > 2.22.0 > > This seems like an issue with time spent while doing sha256 > verification while in purgatory. > > Can you please try the following two patches which enable D-cache in > purgatory before SHA verification and disable it before switching to > kernel: > > http://lists.infradead.org/pipermail/kexec/2017-May/018839.html > http://lists.infradead.org/pipermail/kexec/2017-May/018840.html Hi Bhupesh, The verification was taking 2.31s. This is why it is disabled via kexec's '-i' flag. Therefore 0.35s is only the relocation part where time is spent, and with my patches the time is completely gone. Actually, I am glad you showed these patches to me because I might pull them and enable verification for our needs. > > Note that these were not accepted upstream but are included in several > distros in some form or the other :) Enabling MMU and D-Cache for relocation would essentially require the same changes in kernel. Could you please share exactly why these were not accepted upstream into kexec-tools? Thank you, Pasha > > Thanks, > Bhupesh
Hi Pavel, On 09/07/2019 11:55, Pavel Tatashin wrote: > On Tue, Jul 9, 2019 at 6:36 AM Bhupesh Sharma <bhsharma@redhat.com> wrote: >> On Tue, Jul 9, 2019 at 2:46 AM Pavel Tatashin <pasha.tatashin@soleen.com> wrote: >>> Currently, it is only allowed to reserve memory for crash kernel, because >>> it is a requirement in order to be able to boot into crash kernel without >>> touching memory of crashed kernel is to have memory reserved. >>> >>> The second benefit for having memory reserved for kexec kernel is >>> that it does not require a relocation after segments are loaded into >>> memory. >>> >>> If kexec functionality is used for a fast system update, with a minimal >>> downtime, the relocation of kernel + initramfs might take a significant >>> portion of reboot. >>> >>> In fact, on the machine that we are using, that has ARM64 processor >>> it takes 0.35s to relocate during kexec, thus taking 52% of kernel reboot >>> time: >>> >>> kernel shutdown 0.03s >>> relocation 0.35s >>> kernel startup 0.29s >>> >>> Image: 13M and initramfs is 24M. If initramfs increases, the relocation >>> time increases proportionally. >>> >>> While, it is possible to add 'kexeckernel=' parameters support to other >>> architectures by modifying reserve_crashkernel(), in this series this is >>> done for arm64 only. >> >> This seems like an issue with time spent while doing sha256 >> verification while in purgatory. >> >> Can you please try the following two patches which enable D-cache in >> purgatory before SHA verification and disable it before switching to >> kernel: >> >> http://lists.infradead.org/pipermail/kexec/2017-May/018839.html >> http://lists.infradead.org/pipermail/kexec/2017-May/018840.html > > Hi Bhupesh, > > The verification was taking 2.31s. This is why it is disabled via > kexec's '-i' flag. Therefore 0.35s is only the relocation part where > time is spent, and with my patches the time is completely gone. > Actually, I am glad you showed these patches to me because I might > pull them and enable verification for our needs. > >> >> Note that these were not accepted upstream but are included in several >> distros in some form or the other :) > > Enabling MMU and D-Cache for relocation would essentially require the > same changes in kernel. Could you please share exactly why these were > not accepted upstream into kexec-tools? Because '--no-checks' is a much simpler alternative. More of the discussion: https://lore.kernel.org/linux-arm-kernel/5599813d-f83c-d154-287a-c131c48292ca@arm.com/ While you can make purgatory a fully-fledged operating system, it doesn't really need to do anything on arm64. Errata-workarounds alone are a reason not do start down this path. Thanks, James
> > Enabling MMU and D-Cache for relocation would essentially require the > > same changes in kernel. Could you please share exactly why these were > > not accepted upstream into kexec-tools? > > Because '--no-checks' is a much simpler alternative. > > More of the discussion: > https://lore.kernel.org/linux-arm-kernel/5599813d-f83c-d154-287a-c131c48292ca@arm.com/ > > While you can make purgatory a fully-fledged operating system, it doesn't really need to > do anything on arm64. Errata-workarounds alone are a reason not do start down this path. Thank you James. I will summaries the information gathered from the yesterday's/today's discussion and add it to the cover letter together with ARM64 tag. I think, the patch series makes sense for ARM64 only, unless there are other platforms that disable caching/MMU during relocation. Thank you, Pasha > > > Thanks, > > James
Hi Pasha, On 09/07/2019 14:07, Pavel Tatashin wrote: >>> Enabling MMU and D-Cache for relocation would essentially require the >>> same changes in kernel. Could you please share exactly why these were >>> not accepted upstream into kexec-tools? >> >> Because '--no-checks' is a much simpler alternative. >> >> More of the discussion: >> https://lore.kernel.org/linux-arm-kernel/5599813d-f83c-d154-287a-c131c48292ca@arm.com/ >> >> While you can make purgatory a fully-fledged operating system, it doesn't really need to >> do anything on arm64. Errata-workarounds alone are a reason not do start down this path. > > Thank you James. I will summaries the information gathered from the > yesterday's/today's discussion and add it to the cover letter together > with ARM64 tag. I think, the patch series makes sense for ARM64 only, > unless there are other platforms that disable caching/MMU during > relocation. I'd prefer not to reserve additional memory for regular kexec just to avoid the relocation. If the kernel's relocation work is so painful we can investigate doing it while the MMU is enabled. If you can compare regular-kexec with kexec_file_load() you eliminate the purgatory part of the work. Thanks, James
On Wed, Jul 10, 2019 at 11:19 AM James Morse <james.morse@arm.com> wrote: > > Hi Pasha, > > On 09/07/2019 14:07, Pavel Tatashin wrote: > >>> Enabling MMU and D-Cache for relocation would essentially require the > >>> same changes in kernel. Could you please share exactly why these were > >>> not accepted upstream into kexec-tools? > >> > >> Because '--no-checks' is a much simpler alternative. > >> > >> More of the discussion: > >> https://lore.kernel.org/linux-arm-kernel/5599813d-f83c-d154-287a-c131c48292ca@arm.com/ > >> > >> While you can make purgatory a fully-fledged operating system, it doesn't really need to > >> do anything on arm64. Errata-workarounds alone are a reason not do start down this path. > > > > Thank you James. I will summaries the information gathered from the > > yesterday's/today's discussion and add it to the cover letter together > > with ARM64 tag. I think, the patch series makes sense for ARM64 only, > > unless there are other platforms that disable caching/MMU during > > relocation. > > I'd prefer not to reserve additional memory for regular kexec just to avoid the relocation. > If the kernel's relocation work is so painful we can investigate doing it while the MMU is > enabled. If you can compare regular-kexec with kexec_file_load() you eliminate the > purgatory part of the work. Relocation time is exactly the same for regular-kexec and kexec_file_load(). So, the relocation is indeed painful for our case. I am working on adding MMU enabled kernel relocation. Pasha
Hi, On 7/10/19 4:56 PM, Pavel Tatashin wrote: > On Wed, Jul 10, 2019 at 11:19 AM James Morse <james.morse@arm.com> wrote: >> >> Hi Pasha, >> >> On 09/07/2019 14:07, Pavel Tatashin wrote: >>>>> Enabling MMU and D-Cache for relocation would essentially require the >>>>> same changes in kernel. Could you please share exactly why these were >>>>> not accepted upstream into kexec-tools? >>>> >>>> Because '--no-checks' is a much simpler alternative. >>>> >>>> More of the discussion: >>>> https://lore.kernel.org/linux-arm-kernel/5599813d-f83c-d154-287a-c131c48292ca@arm.com/ >>>> >>>> While you can make purgatory a fully-fledged operating system, it doesn't really need to >>>> do anything on arm64. Errata-workarounds alone are a reason not do start down this path. >>> >>> Thank you James. I will summaries the information gathered from the >>> yesterday's/today's discussion and add it to the cover letter together >>> with ARM64 tag. I think, the patch series makes sense for ARM64 only, >>> unless there are other platforms that disable caching/MMU during >>> relocation. >> >> I'd prefer not to reserve additional memory for regular kexec just to avoid the relocation. >> If the kernel's relocation work is so painful we can investigate doing it while the MMU is >> enabled. If you can compare regular-kexec with kexec_file_load() you eliminate the >> purgatory part of the work. > > Relocation time is exactly the same for regular-kexec and > kexec_file_load(). So, the relocation is indeed painful for our case. > I am working on adding MMU enabled kernel relocation. Out of curiosity, does enabling only I-cache make a difference? IIRC, it doesn't require setting MMU, in contrast to D-cache. Cheers Vladimir > > Pasha > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
On Thu, Jul 11, 2019 at 4:12 AM Vladimir Murzin <vladimir.murzin@arm.com> wrote: > > Hi, > > On 7/10/19 4:56 PM, Pavel Tatashin wrote: > > On Wed, Jul 10, 2019 at 11:19 AM James Morse <james.morse@arm.com> wrote: > >> > >> Hi Pasha, > >> > >> On 09/07/2019 14:07, Pavel Tatashin wrote: > >>>>> Enabling MMU and D-Cache for relocation would essentially require the > >>>>> same changes in kernel. Could you please share exactly why these were > >>>>> not accepted upstream into kexec-tools? > >>>> > >>>> Because '--no-checks' is a much simpler alternative. > >>>> > >>>> More of the discussion: > >>>> https://lore.kernel.org/linux-arm-kernel/5599813d-f83c-d154-287a-c131c48292ca@arm.com/ > >>>> > >>>> While you can make purgatory a fully-fledged operating system, it doesn't really need to > >>>> do anything on arm64. Errata-workarounds alone are a reason not do start down this path. > >>> > >>> Thank you James. I will summaries the information gathered from the > >>> yesterday's/today's discussion and add it to the cover letter together > >>> with ARM64 tag. I think, the patch series makes sense for ARM64 only, > >>> unless there are other platforms that disable caching/MMU during > >>> relocation. > >> > >> I'd prefer not to reserve additional memory for regular kexec just to avoid the relocation. > >> If the kernel's relocation work is so painful we can investigate doing it while the MMU is > >> enabled. If you can compare regular-kexec with kexec_file_load() you eliminate the > >> purgatory part of the work. > > > > Relocation time is exactly the same for regular-kexec and > > kexec_file_load(). So, the relocation is indeed painful for our case. > > I am working on adding MMU enabled kernel relocation. > > Out of curiosity, does enabling only I-cache make a difference? IIRC, it doesn't > require setting MMU, in contrast to D-cache. Resend: Thank you for suggestion. I have actually experimented with enabling caches without MMU. Did not see a difference. Thank you, Pasha > > Cheers > Vladimir > > > > > Pasha > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > >