mbox series

[V3,0/4] arm64/mm: Enable memory hot remove

Message ID 1557824407-19092-1-git-send-email-anshuman.khandual@arm.com (mailing list archive)
Headers show
Series arm64/mm: Enable memory hot remove | expand

Message

Anshuman Khandual May 14, 2019, 9 a.m. UTC
This series enables memory hot remove on arm64 after fixing a memblock
removal ordering problem in generic __remove_memory() and kernel page
table race conditions on arm64. This is based on the following arm64
working tree.

git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core

David had pointed out that the following patch is already in next/master
(58b11e136dcc14358) and will conflict with the last patch here. Will fix
the conflict once this series gets reviewed and agreed upon.

Author: David Hildenbrand <david@redhat.com>
Date:   Wed Apr 10 11:02:27 2019 +1000

    mm/memory_hotplug: make __remove_pages() and arch_remove_memory() never fail

    All callers of arch_remove_memory() ignore errors.  And we should really
    try to remove any errors from the memory removal path.  No more errors are
    reported from __remove_pages().  BUG() in s390x code in case
    arch_remove_memory() is triggered.  We may implement that properly later.
    WARN in case powerpc code failed to remove the section mapping, which is
    better than ignoring the error completely right now.

Testing:

Tested memory hot remove on arm64 for 4K, 16K, 64K page config options with
all possible CONFIG_ARM64_VA_BITS and CONFIG_PGTABLE_LEVELS combinations. But
build tested on non arm64 platforms.

Changes in V3:
 
- Implemented most of the suggestions from Mark Rutland for remove_pagetable()
- Fixed applicable PGTABLE_LEVEL wrappers around pgtable page freeing functions
- Replaced 'direct' with 'sparse_vmap' in remove_pagetable() with inverted polarity
- Changed pointer names ('p' at end) and removed tmp from iterations
- Perform intermediate TLB invalidation while clearing pgtable entries
- Dropped flush_tlb_kernel_range() in remove_pagetable()
- Added flush_tlb_kernel_range() in remove_pte_table() instead
- Renamed page freeing functions for pgtable page and mapped pages
- Used page range size instead of order while freeing mapped or pgtable pages
- Removed all PageReserved() handling while freeing mapped or pgtable pages
- Replaced XXX_index() with XXX_offset() while walking the kernel page table
- Used READ_ONCE() while fetching individual pgtable entries
- Taken overall init_mm.page_table_lock instead of just while changing an entry
- Dropped previously added [pmd|pud]_index() which are not required anymore

- Added a new patch to protect kernel page table race condtion for ptdump
- Added a new patch from Mark Rutland to prevent huge-vmap with ptdump

Changes in V2: (https://lkml.org/lkml/2019/4/14/5)

- Added all received review and ack tags
- Split the series from ZONE_DEVICE enablement for better review
- Moved memblock re-order patch to the front as per Robin Murphy
- Updated commit message on memblock re-order patch per Michal Hocko
- Dropped [pmd|pud]_large() definitions
- Used existing [pmd|pud]_sect() instead of earlier [pmd|pud]_large()
- Removed __meminit and __ref tags as per Oscar Salvador
- Dropped unnecessary 'ret' init in arch_add_memory() per Robin Murphy
- Skipped calling into pgtable_page_dtor() for linear mapping page table
  pages and updated all relevant functions

Changes in V1: (https://lkml.org/lkml/2019/4/3/28)

Anshuman Khandual (3):
  mm/hotplug: Reorder arch_remove_memory() call in __remove_memory()
  arm64/mm: Hold memory hotplug lock while walking for kernel page table dump
  arm64/mm: Enable memory hot remove

Mark Rutland (1):
  arm64/mm: Inhibit huge-vmap with ptdump

 arch/arm64/Kconfig             |   3 +
 arch/arm64/mm/mmu.c            | 215 ++++++++++++++++++++++++++++++++++++++++-
 arch/arm64/mm/ptdump_debugfs.c |   3 +
 mm/memory_hotplug.c            |   3 +-
 4 files changed, 217 insertions(+), 7 deletions(-)

Comments

David Hildenbrand May 14, 2019, 9:10 a.m. UTC | #1
On 14.05.19 11:00, Anshuman Khandual wrote:
> This series enables memory hot remove on arm64 after fixing a memblock
> removal ordering problem in generic __remove_memory() and kernel page
> table race conditions on arm64. This is based on the following arm64
> working tree.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
> 
> David had pointed out that the following patch is already in next/master
> (58b11e136dcc14358) and will conflict with the last patch here. Will fix
> the conflict once this series gets reviewed and agreed upon.

I should read the cover letter first, so ignore my comments :)

> 
> Author: David Hildenbrand <david@redhat.com>
> Date:   Wed Apr 10 11:02:27 2019 +1000
> 
>     mm/memory_hotplug: make __remove_pages() and arch_remove_memory() never fail
> 
>     All callers of arch_remove_memory() ignore errors.  And we should really
>     try to remove any errors from the memory removal path.  No more errors are
>     reported from __remove_pages().  BUG() in s390x code in case
>     arch_remove_memory() is triggered.  We may implement that properly later.
>     WARN in case powerpc code failed to remove the section mapping, which is
>     better than ignoring the error completely right now.
> 
> Testing:
> 
> Tested memory hot remove on arm64 for 4K, 16K, 64K page config options with
> all possible CONFIG_ARM64_VA_BITS and CONFIG_PGTABLE_LEVELS combinations. But
> build tested on non arm64 platforms.
> 
> Changes in V3:
>  
> - Implemented most of the suggestions from Mark Rutland for remove_pagetable()
> - Fixed applicable PGTABLE_LEVEL wrappers around pgtable page freeing functions
> - Replaced 'direct' with 'sparse_vmap' in remove_pagetable() with inverted polarity
> - Changed pointer names ('p' at end) and removed tmp from iterations
> - Perform intermediate TLB invalidation while clearing pgtable entries
> - Dropped flush_tlb_kernel_range() in remove_pagetable()
> - Added flush_tlb_kernel_range() in remove_pte_table() instead
> - Renamed page freeing functions for pgtable page and mapped pages
> - Used page range size instead of order while freeing mapped or pgtable pages
> - Removed all PageReserved() handling while freeing mapped or pgtable pages
> - Replaced XXX_index() with XXX_offset() while walking the kernel page table
> - Used READ_ONCE() while fetching individual pgtable entries
> - Taken overall init_mm.page_table_lock instead of just while changing an entry
> - Dropped previously added [pmd|pud]_index() which are not required anymore
> 
> - Added a new patch to protect kernel page table race condtion for ptdump
> - Added a new patch from Mark Rutland to prevent huge-vmap with ptdump
> 
> Changes in V2: (https://lkml.org/lkml/2019/4/14/5)
> 
> - Added all received review and ack tags
> - Split the series from ZONE_DEVICE enablement for better review
> - Moved memblock re-order patch to the front as per Robin Murphy
> - Updated commit message on memblock re-order patch per Michal Hocko
> - Dropped [pmd|pud]_large() definitions
> - Used existing [pmd|pud]_sect() instead of earlier [pmd|pud]_large()
> - Removed __meminit and __ref tags as per Oscar Salvador
> - Dropped unnecessary 'ret' init in arch_add_memory() per Robin Murphy
> - Skipped calling into pgtable_page_dtor() for linear mapping page table
>   pages and updated all relevant functions
> 
> Changes in V1: (https://lkml.org/lkml/2019/4/3/28)
> 
> Anshuman Khandual (3):
>   mm/hotplug: Reorder arch_remove_memory() call in __remove_memory()
>   arm64/mm: Hold memory hotplug lock while walking for kernel page table dump
>   arm64/mm: Enable memory hot remove
> 
> Mark Rutland (1):
>   arm64/mm: Inhibit huge-vmap with ptdump
> 
>  arch/arm64/Kconfig             |   3 +
>  arch/arm64/mm/mmu.c            | 215 ++++++++++++++++++++++++++++++++++++++++-
>  arch/arm64/mm/ptdump_debugfs.c |   3 +
>  mm/memory_hotplug.c            |   3 +-
>  4 files changed, 217 insertions(+), 7 deletions(-)
>