Message ID | 1551071039-20192-2-git-send-email-anshuman.khandual@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64/mm: Enable accounting for page table pages | expand |
Hi Anshuman, On Mon, Feb 25, 2019 at 10:33:54AM +0530, Anshuman Khandual wrote: > ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor] > constructs. Certain stage-2 page table allocations are multi order which > cannot be allocated through a generic pgtable function as it does not exist > right now. This prevents all pgtable allocations including multi order ones > in stage-2 from moving into new ARM64 (pgtable_page_[ctor|dtor]) pgtable > functions. Hence remove all generic pgtable allocation function dependency > from stage-2 page tables till there is one multi-order allocation function > available. I'm a bit confused by this. Which allocations are multi-order? Why does that prevent other allcoations from using the regular routines? Thanks, Mark. > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> > --- > arch/arm/include/asm/stage2_pgtable.h | 4 ++-- > arch/arm64/include/asm/stage2_pgtable.h | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h > index de2089501b8b..417a3be00718 100644 > --- a/arch/arm/include/asm/stage2_pgtable.h > +++ b/arch/arm/include/asm/stage2_pgtable.h > @@ -32,14 +32,14 @@ > #define stage2_pgd_present(kvm, pgd) pgd_present(pgd) > #define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud) > #define stage2_pud_offset(kvm, pgd, address) pud_offset(pgd, address) > -#define stage2_pud_free(kvm, pud) pud_free(NULL, pud) > +#define stage2_pud_free(kvm, pud) free_page((unsigned long)pud) > > #define stage2_pud_none(kvm, pud) pud_none(pud) > #define stage2_pud_clear(kvm, pud) pud_clear(pud) > #define stage2_pud_present(kvm, pud) pud_present(pud) > #define stage2_pud_populate(kvm, pud, pmd) pud_populate(NULL, pud, pmd) > #define stage2_pmd_offset(kvm, pud, address) pmd_offset(pud, address) > -#define stage2_pmd_free(kvm, pmd) pmd_free(NULL, pmd) > +#define stage2_pmd_free(kvm, pmd) free_page((unsigned long)pmd) > > #define stage2_pud_huge(kvm, pud) pud_huge(pud) > > diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h > index 5412fa40825e..915809e4ac32 100644 > --- a/arch/arm64/include/asm/stage2_pgtable.h > +++ b/arch/arm64/include/asm/stage2_pgtable.h > @@ -119,7 +119,7 @@ static inline pud_t *stage2_pud_offset(struct kvm *kvm, > static inline void stage2_pud_free(struct kvm *kvm, pud_t *pud) > { > if (kvm_stage2_has_pud(kvm)) > - pud_free(NULL, pud); > + free_page((unsigned long)pud); > } > > static inline bool stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp) > @@ -192,7 +192,7 @@ static inline pmd_t *stage2_pmd_offset(struct kvm *kvm, > static inline void stage2_pmd_free(struct kvm *kvm, pmd_t *pmd) > { > if (kvm_stage2_has_pmd(kvm)) > - pmd_free(NULL, pmd); > + free_page((unsigned long)pmd); > } > > static inline bool stage2_pud_huge(struct kvm *kvm, pud_t pud) > -- > 2.20.1 >
On 02/25/2019 04:30 PM, Mark Rutland wrote: > Hi Anshuman, > > On Mon, Feb 25, 2019 at 10:33:54AM +0530, Anshuman Khandual wrote: >> ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor] >> constructs. Certain stage-2 page table allocations are multi order which >> cannot be allocated through a generic pgtable function as it does not exist >> right now. This prevents all pgtable allocations including multi order ones >> in stage-2 from moving into new ARM64 (pgtable_page_[ctor|dtor]) pgtable >> functions. Hence remove all generic pgtable allocation function dependency >> from stage-2 page tables till there is one multi-order allocation function >> available. > > I'm a bit confused by this. Which allocations are multi-order? > Stage-2 PGD. kvm_alloc_stage2_pgd -> alloc_pages_exact kvm_free_stage2_pgd -> free_pages_exact > Why does that prevent other allcoations from using the regular routines? At present both stage-2 PGD (kvm_alloc_stage2_pgd -> alloc_pages_exact), PUD|PMD (mmu_memory_cache_alloc) allocates directly from buddy allocator but then while freeing back stage-2 PGD directly calls buddy allocator via free_pages_exact but PUD|PMD get freed with stage2_[pud|pmd]_free which calls pud|pmd_free instead of calling free_pages() directly. All of these worked fine because pud|pmd_free called free_pages() directly with out going through pgtable_page_dtor(). But now we are changing pud|pmd_free to use pgtable_page_dtor() both for user and host kernel page tables. This will break stage2 page table (bad page state errors) because the new free path which would call pgtable_page_dtor() where as alloc patch never called pgtable_page_ctor(). To fix this situation either we move all stage-2 page table to use pte_alloc_one() and pte_free() which goes through pgtable_page_ctor/dtor cycle or just keep the virtualization page tables out of it (stage2/hyp) and remove the only dependency which breaks because of these changes. This series went with the second option.
On Mon, Feb 25, 2019 at 07:50:27PM +0530, Anshuman Khandual wrote: > > > On 02/25/2019 04:30 PM, Mark Rutland wrote: > > Hi Anshuman, > > > > On Mon, Feb 25, 2019 at 10:33:54AM +0530, Anshuman Khandual wrote: > >> ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor] > >> constructs. Certain stage-2 page table allocations are multi order which > >> cannot be allocated through a generic pgtable function as it does not exist > >> right now. This prevents all pgtable allocations including multi order ones > >> in stage-2 from moving into new ARM64 (pgtable_page_[ctor|dtor]) pgtable > >> functions. Hence remove all generic pgtable allocation function dependency > >> from stage-2 page tables till there is one multi-order allocation function > >> available. > > > > I'm a bit confused by this. Which allocations are multi-order? > > > > Stage-2 PGD. > > kvm_alloc_stage2_pgd -> alloc_pages_exact > kvm_free_stage2_pgd -> free_pages_exact > > > Why does that prevent other allcoations from using the regular routines? > > At present both stage-2 PGD (kvm_alloc_stage2_pgd -> alloc_pages_exact), PUD|PMD > (mmu_memory_cache_alloc) allocates directly from buddy allocator but then while > freeing back stage-2 PGD directly calls buddy allocator via free_pages_exact but > PUD|PMD get freed with stage2_[pud|pmd]_free which calls pud|pmd_free instead > of calling free_pages() directly. If we allocate/free the stage-2 PGD with {alloc,free}_pages_exact(), then the PGD level is balanced today. I don't see what that has to do with the other levels of table. > All of these worked fine because pud|pmd_free called free_pages() directly with > out going through pgtable_page_dtor(). But now we are changing pud|pmd_free to > use pgtable_page_dtor() both for user and host kernel page tables. This will > break stage2 page table (bad page state errors) because the new free path which > would call pgtable_page_dtor() where as alloc patch never called pgtable_page_ctor(). I'm lost as to how that relates to the alloc/free of the PGD. AFAICT, that's unrelated to the problem at hand. What subtlety am I missing? > To fix this situation either we move all stage-2 page table to use pte_alloc_one() > and pte_free() which goes through pgtable_page_ctor/dtor cycle or just keep the > virtualization page tables out of it (stage2/hyp) and remove the only dependency > which breaks because of these changes. This series went with the second option. It sounds to me like this is just a mismatch between the alloc/free paths for the PUD and PMD levels of table. IIUC, we allocate those with {pmd,pud}_alloc_one(), and free them with stage2_{pud,pmd}_free(), which call {pud,pmd}_free(). I would naively expect p?d_alloc_one() to pair with p?d_free(), so that being a problem is surprising to me. Thanks, Mark.
On 02/25/2019 09:19 PM, Mark Rutland wrote: > On Mon, Feb 25, 2019 at 07:50:27PM +0530, Anshuman Khandual wrote: >> >> >> On 02/25/2019 04:30 PM, Mark Rutland wrote: >>> Hi Anshuman, >>> >>> On Mon, Feb 25, 2019 at 10:33:54AM +0530, Anshuman Khandual wrote: >>>> ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor] >>>> constructs. Certain stage-2 page table allocations are multi order which >>>> cannot be allocated through a generic pgtable function as it does not exist >>>> right now. This prevents all pgtable allocations including multi order ones >>>> in stage-2 from moving into new ARM64 (pgtable_page_[ctor|dtor]) pgtable >>>> functions. Hence remove all generic pgtable allocation function dependency >>>> from stage-2 page tables till there is one multi-order allocation function >>>> available. >>> >>> I'm a bit confused by this. Which allocations are multi-order? >>> >> >> Stage-2 PGD. >> >> kvm_alloc_stage2_pgd -> alloc_pages_exact >> kvm_free_stage2_pgd -> free_pages_exact >> >>> Why does that prevent other allcoations from using the regular routines? >> >> At present both stage-2 PGD (kvm_alloc_stage2_pgd -> alloc_pages_exact), PUD|PMD >> (mmu_memory_cache_alloc) allocates directly from buddy allocator but then while >> freeing back stage-2 PGD directly calls buddy allocator via free_pages_exact but >> PUD|PMD get freed with stage2_[pud|pmd]_free which calls pud|pmd_free instead >> of calling free_pages() directly. > > If we allocate/free the stage-2 PGD with {alloc,free}_pages_exact(), > then the PGD level is balanced today. > > I don't see what that has to do with the other levels of table. The idea is either all page table levels pages go through ctor/dtor constructs or none of them do. Not only this makes sense logically but also prevents getting into bad page state errors when freeing without calling dtor. For other non-KVM use cases like kernel linear or vmmap it helps while tearing down the page table for memory hot-remove cases. > >> All of these worked fine because pud|pmd_free called free_pages() directly with >> out going through pgtable_page_dtor(). But now we are changing pud|pmd_free to >> use pgtable_page_dtor() both for user and host kernel page tables. This will >> break stage2 page table (bad page state errors) because the new free path which >> would call pgtable_page_dtor() where as alloc patch never called pgtable_page_ctor(). > > I'm lost as to how that relates to the alloc/free of the PGD. AFAICT, > that's unrelated to the problem at hand. As mentioned above either all page table use ctor/dtor based allocation/free or none of them do. > > What subtlety am I missing? pmd|pud_free() will call dtor() after the series hence they cannot be used by stage2_pmd|pud_free because stage2_pud|pmd get allocated from pre-allocated mmu_memory_cache_alloc which builds with __get_free_page(PGALLOC_GFP). Is not that right ? > >> To fix this situation either we move all stage-2 page table to use pte_alloc_one() >> and pte_free() which goes through pgtable_page_ctor/dtor cycle or just keep the >> virtualization page tables out of it (stage2/hyp) and remove the only dependency >> which breaks because of these changes. This series went with the second option. > > It sounds to me like this is just a mismatch between the alloc/free > paths for the PUD and PMD levels of table. > > IIUC, we allocate those with {pmd,pud}_alloc_one(), and free them with > stage2_{pud,pmd}_free(), which call {pud,pmd}_free(). AFAICS they get allocated from mmu_memory_cache_alloc instead. pud|pmd_alloc_one() get used only for the _hyp_ mappings not for the stage2 table. kvm_handle_guest_abort user_mem_abort stage2_set_pud_huge -> stage2_get_pud -> mmu_memory_cache_alloc stage2_set_pmd_huge -> stage2_get_pmd -> mmu_memory_cache_alloc stage2_set_pte -> mmu_memory_cache_alloc Am I missing something here. > > I would naively expect p?d_alloc_one() to pair with p?d_free(), so that > being a problem is surprising to me. They work absolutely fine. But thats not the case here. stage2_pud|pmd_free() calls pud|pmd_free() for pages not allocated with pud|pmd_alloc_one().
On 02/26/2019 09:45 AM, Anshuman Khandual wrote: > > > On 02/25/2019 09:19 PM, Mark Rutland wrote: >> On Mon, Feb 25, 2019 at 07:50:27PM +0530, Anshuman Khandual wrote: >>> >>> >>> On 02/25/2019 04:30 PM, Mark Rutland wrote: >>>> Hi Anshuman, >>>> >>>> On Mon, Feb 25, 2019 at 10:33:54AM +0530, Anshuman Khandual wrote: >>>>> ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor] >>>>> constructs. Certain stage-2 page table allocations are multi order which >>>>> cannot be allocated through a generic pgtable function as it does not exist >>>>> right now. This prevents all pgtable allocations including multi order ones >>>>> in stage-2 from moving into new ARM64 (pgtable_page_[ctor|dtor]) pgtable >>>>> functions. Hence remove all generic pgtable allocation function dependency >>>>> from stage-2 page tables till there is one multi-order allocation function >>>>> available. >>>> >>>> I'm a bit confused by this. Which allocations are multi-order? >>>> >>> >>> Stage-2 PGD. >>> >>> kvm_alloc_stage2_pgd -> alloc_pages_exact >>> kvm_free_stage2_pgd -> free_pages_exact >>> >>>> Why does that prevent other allcoations from using the regular routines? >>> >>> At present both stage-2 PGD (kvm_alloc_stage2_pgd -> alloc_pages_exact), PUD|PMD >>> (mmu_memory_cache_alloc) allocates directly from buddy allocator but then while >>> freeing back stage-2 PGD directly calls buddy allocator via free_pages_exact but >>> PUD|PMD get freed with stage2_[pud|pmd]_free which calls pud|pmd_free instead >>> of calling free_pages() directly. >> >> If we allocate/free the stage-2 PGD with {alloc,free}_pages_exact(), >> then the PGD level is balanced today. >> >> I don't see what that has to do with the other levels of table. > > The idea is either all page table levels pages go through ctor/dtor constructs > or none of them do. Not only this makes sense logically but also prevents getting > into bad page state errors when freeing without calling dtor. For other non-KVM > use cases like kernel linear or vmmap it helps while tearing down the page table > for memory hot-remove cases. > >> >>> All of these worked fine because pud|pmd_free called free_pages() directly with >>> out going through pgtable_page_dtor(). But now we are changing pud|pmd_free to >>> use pgtable_page_dtor() both for user and host kernel page tables. This will >>> break stage2 page table (bad page state errors) because the new free path which >>> would call pgtable_page_dtor() where as alloc patch never called pgtable_page_ctor(). >> >> I'm lost as to how that relates to the alloc/free of the PGD. AFAICT, >> that's unrelated to the problem at hand. > > As mentioned above either all page table use ctor/dtor based allocation/free or > none of them do. > >> >> What subtlety am I missing? > > pmd|pud_free() will call dtor() after the series hence they cannot be used by > stage2_pmd|pud_free because stage2_pud|pmd get allocated from pre-allocated > mmu_memory_cache_alloc which builds with __get_free_page(PGALLOC_GFP). Is not > that right ? > >> >>> To fix this situation either we move all stage-2 page table to use pte_alloc_one() >>> and pte_free() which goes through pgtable_page_ctor/dtor cycle or just keep the >>> virtualization page tables out of it (stage2/hyp) and remove the only dependency >>> which breaks because of these changes. This series went with the second option. >> >> It sounds to me like this is just a mismatch between the alloc/free >> paths for the PUD and PMD levels of table. >> >> IIUC, we allocate those with {pmd,pud}_alloc_one(), and free them with >> stage2_{pud,pmd}_free(), which call {pud,pmd}_free(). > > AFAICS they get allocated from mmu_memory_cache_alloc instead. pud|pmd_alloc_one() > get used only for the _hyp_ mappings not for the stage2 table. > > kvm_handle_guest_abort > user_mem_abort > stage2_set_pud_huge -> stage2_get_pud -> mmu_memory_cache_alloc > stage2_set_pmd_huge -> stage2_get_pmd -> mmu_memory_cache_alloc > stage2_set_pte -> mmu_memory_cache_alloc > > Am I missing something here. This needs to be added into this patch here. Got skipped because my guest VM test setup always created PMD level mappings. No impact on arm platform as it's pte_free_kernel() always called free_page which remains unchanged. diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 30251e288629..61ea76d786b9 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -191,7 +191,7 @@ static void clear_stage2_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr VM_BUG_ON(pmd_thp_or_huge(*pmd)); pmd_clear(pmd); kvm_tlb_flush_vmid_ipa(kvm, addr); - pte_free_kernel(NULL, pte_table); + __free_page(virt_to_page(pte_table)); put_page(virt_to_page(pmd)); }
diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index de2089501b8b..417a3be00718 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -32,14 +32,14 @@ #define stage2_pgd_present(kvm, pgd) pgd_present(pgd) #define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud) #define stage2_pud_offset(kvm, pgd, address) pud_offset(pgd, address) -#define stage2_pud_free(kvm, pud) pud_free(NULL, pud) +#define stage2_pud_free(kvm, pud) free_page((unsigned long)pud) #define stage2_pud_none(kvm, pud) pud_none(pud) #define stage2_pud_clear(kvm, pud) pud_clear(pud) #define stage2_pud_present(kvm, pud) pud_present(pud) #define stage2_pud_populate(kvm, pud, pmd) pud_populate(NULL, pud, pmd) #define stage2_pmd_offset(kvm, pud, address) pmd_offset(pud, address) -#define stage2_pmd_free(kvm, pmd) pmd_free(NULL, pmd) +#define stage2_pmd_free(kvm, pmd) free_page((unsigned long)pmd) #define stage2_pud_huge(kvm, pud) pud_huge(pud) diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h index 5412fa40825e..915809e4ac32 100644 --- a/arch/arm64/include/asm/stage2_pgtable.h +++ b/arch/arm64/include/asm/stage2_pgtable.h @@ -119,7 +119,7 @@ static inline pud_t *stage2_pud_offset(struct kvm *kvm, static inline void stage2_pud_free(struct kvm *kvm, pud_t *pud) { if (kvm_stage2_has_pud(kvm)) - pud_free(NULL, pud); + free_page((unsigned long)pud); } static inline bool stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp) @@ -192,7 +192,7 @@ static inline pmd_t *stage2_pmd_offset(struct kvm *kvm, static inline void stage2_pmd_free(struct kvm *kvm, pmd_t *pmd) { if (kvm_stage2_has_pmd(kvm)) - pmd_free(NULL, pmd); + free_page((unsigned long)pmd); } static inline bool stage2_pud_huge(struct kvm *kvm, pud_t pud)
ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor] constructs. Certain stage-2 page table allocations are multi order which cannot be allocated through a generic pgtable function as it does not exist right now. This prevents all pgtable allocations including multi order ones in stage-2 from moving into new ARM64 (pgtable_page_[ctor|dtor]) pgtable functions. Hence remove all generic pgtable allocation function dependency from stage-2 page tables till there is one multi-order allocation function available. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- arch/arm/include/asm/stage2_pgtable.h | 4 ++-- arch/arm64/include/asm/stage2_pgtable.h | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)