mbox series

[0/4] arm/efi: fix memblock reallocation crash due to persistent reservations

Message ID 20181106113732.16351-1-ard.biesheuvel@linaro.org (mailing list archive)
Headers show
Series arm/efi: fix memblock reallocation crash due to persistent reservations | expand

Message

Ard Biesheuvel Nov. 6, 2018, 11:37 a.m. UTC
This series addresses the kexec/kdump crash on arm64 system with many CPUs
that was reported by Bhupesh.

Patches #1 and #2 fix the actual crash. Patches #3 and #4 optimize the
EFI persistent memreserve infrastructure so that fewer memblock reservations
are required.

Ard Biesheuvel (4):
  arm64: memblock: don't permit memblock resizing until linear mapping
    is up
  efi/arm: defer persistent reservations until after paging_init()
  efi: permit multiple entries in persistent memreserve data structure
  efi: reduce the amount of memblock reservations for persistent
    allocations

 arch/arm/kernel/setup.c                 |  1 +
 arch/arm64/kernel/setup.c               |  1 +
 arch/arm64/mm/init.c                    |  2 -
 arch/arm64/mm/mmu.c                     |  2 +
 drivers/firmware/efi/efi.c              | 59 ++++++++++++++------
 drivers/firmware/efi/libstub/arm-stub.c |  2 +-
 include/linux/efi.h                     | 23 +++++++-
 7 files changed, 68 insertions(+), 22 deletions(-)

Comments

Marc Zyngier Nov. 6, 2018, 6:27 p.m. UTC | #1
Hi Ard,

On 06/11/18 11:37, Ard Biesheuvel wrote:
> This series addresses the kexec/kdump crash on arm64 system with many CPUs
> that was reported by Bhupesh.
> 
> Patches #1 and #2 fix the actual crash. Patches #3 and #4 optimize the
> EFI persistent memreserve infrastructure so that fewer memblock reservations
> are required.
> 
> Ard Biesheuvel (4):
>   arm64: memblock: don't permit memblock resizing until linear mapping
>     is up
>   efi/arm: defer persistent reservations until after paging_init()
>   efi: permit multiple entries in persistent memreserve data structure
>   efi: reduce the amount of memblock reservations for persistent
>     allocations
> 
>  arch/arm/kernel/setup.c                 |  1 +
>  arch/arm64/kernel/setup.c               |  1 +
>  arch/arm64/mm/init.c                    |  2 -
>  arch/arm64/mm/mmu.c                     |  2 +
>  drivers/firmware/efi/efi.c              | 59 ++++++++++++++------
>  drivers/firmware/efi/libstub/arm-stub.c |  2 +-
>  include/linux/efi.h                     | 23 +++++++-
>  7 files changed, 68 insertions(+), 22 deletions(-)

I've just given these patches a go a a TX2 box (one of the 224 CPU
ones...), and kexec worked just fine (v4.20-rc1 vanilla didn't manage to
kexec on this box).

There seem to be some additional userspace patches that are required for
the ACPI tables not to be corrupted in the secondary kernel, but that's
an orthogonal issue.

Feel free to add

Tested-by: Marc Zyngier <marc.zyngier@arm.com>

Thanks,

	M.
Ard Biesheuvel Nov. 6, 2018, 7:01 p.m. UTC | #2
On 6 November 2018 at 19:27, Marc Zyngier <marc.zyngier@arm.com> wrote:
> Hi Ard,
>
> On 06/11/18 11:37, Ard Biesheuvel wrote:
>> This series addresses the kexec/kdump crash on arm64 system with many CPUs
>> that was reported by Bhupesh.
>>
>> Patches #1 and #2 fix the actual crash. Patches #3 and #4 optimize the
>> EFI persistent memreserve infrastructure so that fewer memblock reservations
>> are required.
>>
>> Ard Biesheuvel (4):
>>   arm64: memblock: don't permit memblock resizing until linear mapping
>>     is up
>>   efi/arm: defer persistent reservations until after paging_init()
>>   efi: permit multiple entries in persistent memreserve data structure
>>   efi: reduce the amount of memblock reservations for persistent
>>     allocations
>>
>>  arch/arm/kernel/setup.c                 |  1 +
>>  arch/arm64/kernel/setup.c               |  1 +
>>  arch/arm64/mm/init.c                    |  2 -
>>  arch/arm64/mm/mmu.c                     |  2 +
>>  drivers/firmware/efi/efi.c              | 59 ++++++++++++++------
>>  drivers/firmware/efi/libstub/arm-stub.c |  2 +-
>>  include/linux/efi.h                     | 23 +++++++-
>>  7 files changed, 68 insertions(+), 22 deletions(-)
>
> I've just given these patches a go a a TX2 box (one of the 224 CPU
> ones...), and kexec worked just fine (v4.20-rc1 vanilla didn't manage to
> kexec on this box).
>
> There seem to be some additional userspace patches that are required for
> the ACPI tables not to be corrupted in the secondary kernel, but that's
> an orthogonal issue.
>
> Feel free to add
>
> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
>

Thanks Marc.

Any thoughts on whether patches #3 and #4 should be included as fixes?
I'm leaning towards yes.
Marc Zyngier Nov. 6, 2018, 7:40 p.m. UTC | #3
On Tue, 06 Nov 2018 19:01:51 +0000,
Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> 
> On 6 November 2018 at 19:27, Marc Zyngier <marc.zyngier@arm.com> wrote:
> > Hi Ard,
> >
> > On 06/11/18 11:37, Ard Biesheuvel wrote:
> >> This series addresses the kexec/kdump crash on arm64 system with many CPUs
> >> that was reported by Bhupesh.
> >>
> >> Patches #1 and #2 fix the actual crash. Patches #3 and #4 optimize the
> >> EFI persistent memreserve infrastructure so that fewer memblock reservations
> >> are required.
> >>
> >> Ard Biesheuvel (4):
> >>   arm64: memblock: don't permit memblock resizing until linear mapping
> >>     is up
> >>   efi/arm: defer persistent reservations until after paging_init()
> >>   efi: permit multiple entries in persistent memreserve data structure
> >>   efi: reduce the amount of memblock reservations for persistent
> >>     allocations
> >>
> >>  arch/arm/kernel/setup.c                 |  1 +
> >>  arch/arm64/kernel/setup.c               |  1 +
> >>  arch/arm64/mm/init.c                    |  2 -
> >>  arch/arm64/mm/mmu.c                     |  2 +
> >>  drivers/firmware/efi/efi.c              | 59 ++++++++++++++------
> >>  drivers/firmware/efi/libstub/arm-stub.c |  2 +-
> >>  include/linux/efi.h                     | 23 +++++++-
> >>  7 files changed, 68 insertions(+), 22 deletions(-)
> >
> > I've just given these patches a go a a TX2 box (one of the 224 CPU
> > ones...), and kexec worked just fine (v4.20-rc1 vanilla didn't manage to
> > kexec on this box).
> >
> > There seem to be some additional userspace patches that are required for
> > the ACPI tables not to be corrupted in the secondary kernel, but that's
> > an orthogonal issue.
> >
> > Feel free to add
> >
> > Tested-by: Marc Zyngier <marc.zyngier@arm.com>
> >
> 
> Thanks Marc.
> 
> Any thoughts on whether patches #3 and #4 should be included as fixes?
> I'm leaning towards yes.

They certainly massively reduce the number of list entries, and
probably help reducing the overhead. I'd be inclined to merge them as
well.

Thanks,

	M.
Will Deacon Nov. 6, 2018, 9:34 p.m. UTC | #4
Hi Ard,

On Tue, Nov 06, 2018 at 12:37:28PM +0100, Ard Biesheuvel wrote:
> This series addresses the kexec/kdump crash on arm64 system with many CPUs
> that was reported by Bhupesh.
> 
> Patches #1 and #2 fix the actual crash. Patches #3 and #4 optimize the
> EFI persistent memreserve infrastructure so that fewer memblock reservations
> are required.

I acked the arm64 part and patches 3 and 4 look good afaict, so:

Acked-by: Will Deacon <will.deacon@arm.com>

for those as well.

The only thing I was wondering is whether or not exhausting the page-sized
array in the first list entry is rare enough that we could just realloc the
thing and copy instead of chaining together new pages. That said, without
seeing the code it's hard to tell whether you save much complexity with such
a scheme so I'll leave it up to you.

Will
Ard Biesheuvel Nov. 6, 2018, 9:39 p.m. UTC | #5
On 6 November 2018 at 22:34, Will Deacon <will.deacon@arm.com> wrote:
> Hi Ard,
>
> On Tue, Nov 06, 2018 at 12:37:28PM +0100, Ard Biesheuvel wrote:
>> This series addresses the kexec/kdump crash on arm64 system with many CPUs
>> that was reported by Bhupesh.
>>
>> Patches #1 and #2 fix the actual crash. Patches #3 and #4 optimize the
>> EFI persistent memreserve infrastructure so that fewer memblock reservations
>> are required.
>
> I acked the arm64 part and patches 3 and 4 look good afaict, so:
>
> Acked-by: Will Deacon <will.deacon@arm.com>
>
> for those as well.
>
> The only thing I was wondering is whether or not exhausting the page-sized
> array in the first list entry is rare enough that we could just realloc the
> thing and copy instead of chaining together new pages. That said, without
> seeing the code it's hard to tell whether you save much complexity with such
> a scheme so I'll leave it up to you.
>

Well, the problem is that the page-sized array may have been allocated
by a previous kernel, and so it may be memblock_reserve()d already, in
which case reallocating does not actually free up the memory in a
useful way.

Also, in the unlikely event that we race and allocate two new pages at
the same time, the current scheme will not break (and we will
ultimately fill up all the slots in both pages before allocating even
more pages). This will become a lot trickier if we do reallocation.

So if the current approach looks correct to you, then I'd prefer to
stick with it.

Do you agree with applying #3 and #4 as fixes?
Will Deacon Nov. 6, 2018, 9:46 p.m. UTC | #6
On Tue, Nov 06, 2018 at 10:39:47PM +0100, Ard Biesheuvel wrote:
> On 6 November 2018 at 22:34, Will Deacon <will.deacon@arm.com> wrote:
> > On Tue, Nov 06, 2018 at 12:37:28PM +0100, Ard Biesheuvel wrote:
> >> This series addresses the kexec/kdump crash on arm64 system with many CPUs
> >> that was reported by Bhupesh.
> >>
> >> Patches #1 and #2 fix the actual crash. Patches #3 and #4 optimize the
> >> EFI persistent memreserve infrastructure so that fewer memblock reservations
> >> are required.
> >
> > I acked the arm64 part and patches 3 and 4 look good afaict, so:
> >
> > Acked-by: Will Deacon <will.deacon@arm.com>
> >
> > for those as well.
> >
> > The only thing I was wondering is whether or not exhausting the page-sized
> > array in the first list entry is rare enough that we could just realloc the
> > thing and copy instead of chaining together new pages. That said, without
> > seeing the code it's hard to tell whether you save much complexity with such
> > a scheme so I'll leave it up to you.
> >
> 
> Well, the problem is that the page-sized array may have been allocated
> by a previous kernel, and so it may be memblock_reserve()d already, in
> which case reallocating does not actually free up the memory in a
> useful way.
> 
> Also, in the unlikely event that we race and allocate two new pages at
> the same time, the current scheme will not break (and we will
> ultimately fill up all the slots in both pages before allocating even
> more pages). This will become a lot trickier if we do reallocation.

Ah crap, yeah, the concurrency angle makes that really awful. Thanks.

> So if the current approach looks correct to you, then I'd prefer to
> stick with it.
> 
> Do you agree with applying #3 and #4 as fixes?

If Patch 1 alone is sufficient to solve the issue for Bhupesh, then I don't
think we have a need to treat 3 and 4 as fixes (and therefore we can avoid
having to backport them to stable kernels).

Will