[V2,1/6] KVM: ARM: Remove pgtable standard functions from stage-2 page tables
diff mbox series

Message ID 1551071039-20192-2-git-send-email-anshuman.khandual@arm.com
State New
Headers show
Series
  • arm64/mm: Enable accounting for page table pages
Related show

Commit Message

Anshuman Khandual Feb. 25, 2019, 5:03 a.m. UTC
ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor]
constructs. Certain stage-2 page table allocations are multi order which
cannot be allocated through a generic pgtable function as it does not exist
right now. This prevents all pgtable allocations including multi order ones
in stage-2 from moving into new ARM64 (pgtable_page_[ctor|dtor]) pgtable
functions. Hence remove all generic pgtable allocation function dependency
from stage-2 page tables till there is one multi-order allocation function
available.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm/include/asm/stage2_pgtable.h   | 4 ++--
 arch/arm64/include/asm/stage2_pgtable.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Comments

Mark Rutland Feb. 25, 2019, 11 a.m. UTC | #1
Hi Anshuman,

On Mon, Feb 25, 2019 at 10:33:54AM +0530, Anshuman Khandual wrote:
> ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor]
> constructs. Certain stage-2 page table allocations are multi order which
> cannot be allocated through a generic pgtable function as it does not exist
> right now. This prevents all pgtable allocations including multi order ones
> in stage-2 from moving into new ARM64 (pgtable_page_[ctor|dtor]) pgtable
> functions. Hence remove all generic pgtable allocation function dependency
> from stage-2 page tables till there is one multi-order allocation function
> available.

I'm a bit confused by this. Which allocations are multi-order?

Why does that prevent other allcoations from using the regular routines?

Thanks,
Mark.

> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  arch/arm/include/asm/stage2_pgtable.h   | 4 ++--
>  arch/arm64/include/asm/stage2_pgtable.h | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
> index de2089501b8b..417a3be00718 100644
> --- a/arch/arm/include/asm/stage2_pgtable.h
> +++ b/arch/arm/include/asm/stage2_pgtable.h
> @@ -32,14 +32,14 @@
>  #define stage2_pgd_present(kvm, pgd)		pgd_present(pgd)
>  #define stage2_pgd_populate(kvm, pgd, pud)	pgd_populate(NULL, pgd, pud)
>  #define stage2_pud_offset(kvm, pgd, address)	pud_offset(pgd, address)
> -#define stage2_pud_free(kvm, pud)		pud_free(NULL, pud)
> +#define stage2_pud_free(kvm, pud)		free_page((unsigned long)pud)
>  
>  #define stage2_pud_none(kvm, pud)		pud_none(pud)
>  #define stage2_pud_clear(kvm, pud)		pud_clear(pud)
>  #define stage2_pud_present(kvm, pud)		pud_present(pud)
>  #define stage2_pud_populate(kvm, pud, pmd)	pud_populate(NULL, pud, pmd)
>  #define stage2_pmd_offset(kvm, pud, address)	pmd_offset(pud, address)
> -#define stage2_pmd_free(kvm, pmd)		pmd_free(NULL, pmd)
> +#define stage2_pmd_free(kvm, pmd)		free_page((unsigned long)pmd)
>  
>  #define stage2_pud_huge(kvm, pud)		pud_huge(pud)
>  
> diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
> index 5412fa40825e..915809e4ac32 100644
> --- a/arch/arm64/include/asm/stage2_pgtable.h
> +++ b/arch/arm64/include/asm/stage2_pgtable.h
> @@ -119,7 +119,7 @@ static inline pud_t *stage2_pud_offset(struct kvm *kvm,
>  static inline void stage2_pud_free(struct kvm *kvm, pud_t *pud)
>  {
>  	if (kvm_stage2_has_pud(kvm))
> -		pud_free(NULL, pud);
> +		free_page((unsigned long)pud);
>  }
>  
>  static inline bool stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp)
> @@ -192,7 +192,7 @@ static inline pmd_t *stage2_pmd_offset(struct kvm *kvm,
>  static inline void stage2_pmd_free(struct kvm *kvm, pmd_t *pmd)
>  {
>  	if (kvm_stage2_has_pmd(kvm))
> -		pmd_free(NULL, pmd);
> +		free_page((unsigned long)pmd);
>  }
>  
>  static inline bool stage2_pud_huge(struct kvm *kvm, pud_t pud)
> -- 
> 2.20.1
>
Anshuman Khandual Feb. 25, 2019, 2:20 p.m. UTC | #2
On 02/25/2019 04:30 PM, Mark Rutland wrote:
> Hi Anshuman,
> 
> On Mon, Feb 25, 2019 at 10:33:54AM +0530, Anshuman Khandual wrote:
>> ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor]
>> constructs. Certain stage-2 page table allocations are multi order which
>> cannot be allocated through a generic pgtable function as it does not exist
>> right now. This prevents all pgtable allocations including multi order ones
>> in stage-2 from moving into new ARM64 (pgtable_page_[ctor|dtor]) pgtable
>> functions. Hence remove all generic pgtable allocation function dependency
>> from stage-2 page tables till there is one multi-order allocation function
>> available.
> 
> I'm a bit confused by this. Which allocations are multi-order?
> 

Stage-2 PGD.

kvm_alloc_stage2_pgd -> alloc_pages_exact
kvm_free_stage2_pgd -> free_pages_exact
 
> Why does that prevent other allcoations from using the regular routines?

At present both stage-2 PGD (kvm_alloc_stage2_pgd -> alloc_pages_exact), PUD|PMD
(mmu_memory_cache_alloc) allocates directly from buddy allocator but then while
freeing back stage-2 PGD directly calls buddy allocator via free_pages_exact but
PUD|PMD get freed with stage2_[pud|pmd]_free which calls pud|pmd_free instead
of calling free_pages() directly.

All of these worked fine because pud|pmd_free called free_pages() directly with
out going through pgtable_page_dtor(). But now we are changing pud|pmd_free to
use pgtable_page_dtor() both for user and host kernel page tables. This will
break stage2 page table (bad page state errors) because the new free path which
would call pgtable_page_dtor() where as alloc patch never called pgtable_page_ctor().

To fix this situation either we move all stage-2 page table to use pte_alloc_one()
and pte_free() which goes through pgtable_page_ctor/dtor cycle or just keep the
virtualization page tables out of it (stage2/hyp) and remove the only dependency
which breaks because of these changes. This series went with the second option.
Mark Rutland Feb. 25, 2019, 3:49 p.m. UTC | #3
On Mon, Feb 25, 2019 at 07:50:27PM +0530, Anshuman Khandual wrote:
> 
> 
> On 02/25/2019 04:30 PM, Mark Rutland wrote:
> > Hi Anshuman,
> > 
> > On Mon, Feb 25, 2019 at 10:33:54AM +0530, Anshuman Khandual wrote:
> >> ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor]
> >> constructs. Certain stage-2 page table allocations are multi order which
> >> cannot be allocated through a generic pgtable function as it does not exist
> >> right now. This prevents all pgtable allocations including multi order ones
> >> in stage-2 from moving into new ARM64 (pgtable_page_[ctor|dtor]) pgtable
> >> functions. Hence remove all generic pgtable allocation function dependency
> >> from stage-2 page tables till there is one multi-order allocation function
> >> available.
> > 
> > I'm a bit confused by this. Which allocations are multi-order?
> > 
> 
> Stage-2 PGD.
> 
> kvm_alloc_stage2_pgd -> alloc_pages_exact
> kvm_free_stage2_pgd -> free_pages_exact
>  
> > Why does that prevent other allcoations from using the regular routines?
> 
> At present both stage-2 PGD (kvm_alloc_stage2_pgd -> alloc_pages_exact), PUD|PMD
> (mmu_memory_cache_alloc) allocates directly from buddy allocator but then while
> freeing back stage-2 PGD directly calls buddy allocator via free_pages_exact but
> PUD|PMD get freed with stage2_[pud|pmd]_free which calls pud|pmd_free instead
> of calling free_pages() directly.

If we allocate/free the stage-2 PGD with {alloc,free}_pages_exact(),
then the PGD level is balanced today.

I don't see what that has to do with the other levels of table.

> All of these worked fine because pud|pmd_free called free_pages() directly with
> out going through pgtable_page_dtor(). But now we are changing pud|pmd_free to
> use pgtable_page_dtor() both for user and host kernel page tables. This will
> break stage2 page table (bad page state errors) because the new free path which
> would call pgtable_page_dtor() where as alloc patch never called pgtable_page_ctor().

I'm lost as to how that relates to the alloc/free of the PGD. AFAICT,
that's unrelated to the problem at hand.

What subtlety am I missing?

> To fix this situation either we move all stage-2 page table to use pte_alloc_one()
> and pte_free() which goes through pgtable_page_ctor/dtor cycle or just keep the
> virtualization page tables out of it (stage2/hyp) and remove the only dependency
> which breaks because of these changes. This series went with the second option. 

It sounds to me like this is just a mismatch between the alloc/free
paths for the PUD and PMD levels of table.

IIUC, we allocate those with {pmd,pud}_alloc_one(), and free them with
stage2_{pud,pmd}_free(), which call {pud,pmd}_free().

I would naively expect p?d_alloc_one() to pair with p?d_free(), so that
being a problem is surprising to me.

Thanks,
Mark.
Anshuman Khandual Feb. 26, 2019, 4:15 a.m. UTC | #4
On 02/25/2019 09:19 PM, Mark Rutland wrote:
> On Mon, Feb 25, 2019 at 07:50:27PM +0530, Anshuman Khandual wrote:
>>
>>
>> On 02/25/2019 04:30 PM, Mark Rutland wrote:
>>> Hi Anshuman,
>>>
>>> On Mon, Feb 25, 2019 at 10:33:54AM +0530, Anshuman Khandual wrote:
>>>> ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor]
>>>> constructs. Certain stage-2 page table allocations are multi order which
>>>> cannot be allocated through a generic pgtable function as it does not exist
>>>> right now. This prevents all pgtable allocations including multi order ones
>>>> in stage-2 from moving into new ARM64 (pgtable_page_[ctor|dtor]) pgtable
>>>> functions. Hence remove all generic pgtable allocation function dependency
>>>> from stage-2 page tables till there is one multi-order allocation function
>>>> available.
>>>
>>> I'm a bit confused by this. Which allocations are multi-order?
>>>
>>
>> Stage-2 PGD.
>>
>> kvm_alloc_stage2_pgd -> alloc_pages_exact
>> kvm_free_stage2_pgd -> free_pages_exact
>>  
>>> Why does that prevent other allcoations from using the regular routines?
>>
>> At present both stage-2 PGD (kvm_alloc_stage2_pgd -> alloc_pages_exact), PUD|PMD
>> (mmu_memory_cache_alloc) allocates directly from buddy allocator but then while
>> freeing back stage-2 PGD directly calls buddy allocator via free_pages_exact but
>> PUD|PMD get freed with stage2_[pud|pmd]_free which calls pud|pmd_free instead
>> of calling free_pages() directly.
> 
> If we allocate/free the stage-2 PGD with {alloc,free}_pages_exact(),
> then the PGD level is balanced today.
> 
> I don't see what that has to do with the other levels of table.

The idea is either all page table levels pages go through ctor/dtor constructs
or none of them do. Not only this makes sense logically but also prevents getting
into bad page state errors when freeing without calling dtor. For other non-KVM
use cases like kernel linear or vmmap it helps while tearing down the page table
for memory hot-remove cases.

> 
>> All of these worked fine because pud|pmd_free called free_pages() directly with
>> out going through pgtable_page_dtor(). But now we are changing pud|pmd_free to
>> use pgtable_page_dtor() both for user and host kernel page tables. This will
>> break stage2 page table (bad page state errors) because the new free path which
>> would call pgtable_page_dtor() where as alloc patch never called pgtable_page_ctor().
> 
> I'm lost as to how that relates to the alloc/free of the PGD. AFAICT,
> that's unrelated to the problem at hand.

As mentioned above either all page table use ctor/dtor based allocation/free or
none of them do.

> 
> What subtlety am I missing?

pmd|pud_free() will call dtor() after the series hence they cannot be used by
stage2_pmd|pud_free because stage2_pud|pmd get allocated from pre-allocated
mmu_memory_cache_alloc which builds with __get_free_page(PGALLOC_GFP). Is not
that right ? 

> 
>> To fix this situation either we move all stage-2 page table to use pte_alloc_one()
>> and pte_free() which goes through pgtable_page_ctor/dtor cycle or just keep the
>> virtualization page tables out of it (stage2/hyp) and remove the only dependency
>> which breaks because of these changes. This series went with the second option. 
> 
> It sounds to me like this is just a mismatch between the alloc/free
> paths for the PUD and PMD levels of table.
> 
> IIUC, we allocate those with {pmd,pud}_alloc_one(), and free them with
> stage2_{pud,pmd}_free(), which call {pud,pmd}_free().

AFAICS they get allocated from mmu_memory_cache_alloc instead. pud|pmd_alloc_one()
get used only for the _hyp_ mappings not for the stage2 table.

kvm_handle_guest_abort
	user_mem_abort
		stage2_set_pud_huge -> stage2_get_pud -> mmu_memory_cache_alloc
		stage2_set_pmd_huge -> stage2_get_pmd -> mmu_memory_cache_alloc
		stage2_set_pte -> mmu_memory_cache_alloc

Am I missing something here.

> 
> I would naively expect p?d_alloc_one() to pair with p?d_free(), so that
> being a problem is surprising to me.

They work absolutely fine. But thats not the case here. stage2_pud|pmd_free()
calls pud|pmd_free() for pages not allocated with pud|pmd_alloc_one().
Anshuman Khandual Feb. 26, 2019, 8:51 a.m. UTC | #5
On 02/26/2019 09:45 AM, Anshuman Khandual wrote:
> 
> 
> On 02/25/2019 09:19 PM, Mark Rutland wrote:
>> On Mon, Feb 25, 2019 at 07:50:27PM +0530, Anshuman Khandual wrote:
>>>
>>>
>>> On 02/25/2019 04:30 PM, Mark Rutland wrote:
>>>> Hi Anshuman,
>>>>
>>>> On Mon, Feb 25, 2019 at 10:33:54AM +0530, Anshuman Khandual wrote:
>>>>> ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor]
>>>>> constructs. Certain stage-2 page table allocations are multi order which
>>>>> cannot be allocated through a generic pgtable function as it does not exist
>>>>> right now. This prevents all pgtable allocations including multi order ones
>>>>> in stage-2 from moving into new ARM64 (pgtable_page_[ctor|dtor]) pgtable
>>>>> functions. Hence remove all generic pgtable allocation function dependency
>>>>> from stage-2 page tables till there is one multi-order allocation function
>>>>> available.
>>>>
>>>> I'm a bit confused by this. Which allocations are multi-order?
>>>>
>>>
>>> Stage-2 PGD.
>>>
>>> kvm_alloc_stage2_pgd -> alloc_pages_exact
>>> kvm_free_stage2_pgd -> free_pages_exact
>>>  
>>>> Why does that prevent other allcoations from using the regular routines?
>>>
>>> At present both stage-2 PGD (kvm_alloc_stage2_pgd -> alloc_pages_exact), PUD|PMD
>>> (mmu_memory_cache_alloc) allocates directly from buddy allocator but then while
>>> freeing back stage-2 PGD directly calls buddy allocator via free_pages_exact but
>>> PUD|PMD get freed with stage2_[pud|pmd]_free which calls pud|pmd_free instead
>>> of calling free_pages() directly.
>>
>> If we allocate/free the stage-2 PGD with {alloc,free}_pages_exact(),
>> then the PGD level is balanced today.
>>
>> I don't see what that has to do with the other levels of table.
> 
> The idea is either all page table levels pages go through ctor/dtor constructs
> or none of them do. Not only this makes sense logically but also prevents getting
> into bad page state errors when freeing without calling dtor. For other non-KVM
> use cases like kernel linear or vmmap it helps while tearing down the page table
> for memory hot-remove cases.
> 
>>
>>> All of these worked fine because pud|pmd_free called free_pages() directly with
>>> out going through pgtable_page_dtor(). But now we are changing pud|pmd_free to
>>> use pgtable_page_dtor() both for user and host kernel page tables. This will
>>> break stage2 page table (bad page state errors) because the new free path which
>>> would call pgtable_page_dtor() where as alloc patch never called pgtable_page_ctor().
>>
>> I'm lost as to how that relates to the alloc/free of the PGD. AFAICT,
>> that's unrelated to the problem at hand.
> 
> As mentioned above either all page table use ctor/dtor based allocation/free or
> none of them do.
> 
>>
>> What subtlety am I missing?
> 
> pmd|pud_free() will call dtor() after the series hence they cannot be used by
> stage2_pmd|pud_free because stage2_pud|pmd get allocated from pre-allocated
> mmu_memory_cache_alloc which builds with __get_free_page(PGALLOC_GFP). Is not
> that right ? 
> 
>>
>>> To fix this situation either we move all stage-2 page table to use pte_alloc_one()
>>> and pte_free() which goes through pgtable_page_ctor/dtor cycle or just keep the
>>> virtualization page tables out of it (stage2/hyp) and remove the only dependency
>>> which breaks because of these changes. This series went with the second option. 
>>
>> It sounds to me like this is just a mismatch between the alloc/free
>> paths for the PUD and PMD levels of table.
>>
>> IIUC, we allocate those with {pmd,pud}_alloc_one(), and free them with
>> stage2_{pud,pmd}_free(), which call {pud,pmd}_free().
> 
> AFAICS they get allocated from mmu_memory_cache_alloc instead. pud|pmd_alloc_one()
> get used only for the _hyp_ mappings not for the stage2 table.
> 
> kvm_handle_guest_abort
> 	user_mem_abort
> 		stage2_set_pud_huge -> stage2_get_pud -> mmu_memory_cache_alloc
> 		stage2_set_pmd_huge -> stage2_get_pmd -> mmu_memory_cache_alloc
> 		stage2_set_pte -> mmu_memory_cache_alloc
> 
> Am I missing something here.

This needs to be added into this patch here. Got skipped because my guest VM test setup
always created PMD level mappings. No impact on arm platform as it's pte_free_kernel()
always called free_page which remains unchanged.

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 30251e288629..61ea76d786b9 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -191,7 +191,7 @@ static void clear_stage2_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr
        VM_BUG_ON(pmd_thp_or_huge(*pmd));
        pmd_clear(pmd);
        kvm_tlb_flush_vmid_ipa(kvm, addr);
-       pte_free_kernel(NULL, pte_table);
+       __free_page(virt_to_page(pte_table));
        put_page(virt_to_page(pmd));
 }

Patch
diff mbox series

diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
index de2089501b8b..417a3be00718 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -32,14 +32,14 @@ 
 #define stage2_pgd_present(kvm, pgd)		pgd_present(pgd)
 #define stage2_pgd_populate(kvm, pgd, pud)	pgd_populate(NULL, pgd, pud)
 #define stage2_pud_offset(kvm, pgd, address)	pud_offset(pgd, address)
-#define stage2_pud_free(kvm, pud)		pud_free(NULL, pud)
+#define stage2_pud_free(kvm, pud)		free_page((unsigned long)pud)
 
 #define stage2_pud_none(kvm, pud)		pud_none(pud)
 #define stage2_pud_clear(kvm, pud)		pud_clear(pud)
 #define stage2_pud_present(kvm, pud)		pud_present(pud)
 #define stage2_pud_populate(kvm, pud, pmd)	pud_populate(NULL, pud, pmd)
 #define stage2_pmd_offset(kvm, pud, address)	pmd_offset(pud, address)
-#define stage2_pmd_free(kvm, pmd)		pmd_free(NULL, pmd)
+#define stage2_pmd_free(kvm, pmd)		free_page((unsigned long)pmd)
 
 #define stage2_pud_huge(kvm, pud)		pud_huge(pud)
 
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
index 5412fa40825e..915809e4ac32 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -119,7 +119,7 @@  static inline pud_t *stage2_pud_offset(struct kvm *kvm,
 static inline void stage2_pud_free(struct kvm *kvm, pud_t *pud)
 {
 	if (kvm_stage2_has_pud(kvm))
-		pud_free(NULL, pud);
+		free_page((unsigned long)pud);
 }
 
 static inline bool stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp)
@@ -192,7 +192,7 @@  static inline pmd_t *stage2_pmd_offset(struct kvm *kvm,
 static inline void stage2_pmd_free(struct kvm *kvm, pmd_t *pmd)
 {
 	if (kvm_stage2_has_pmd(kvm))
-		pmd_free(NULL, pmd);
+		free_page((unsigned long)pmd);
 }
 
 static inline bool stage2_pud_huge(struct kvm *kvm, pud_t pud)