diff mbox series

[v2] KVM: arm64: Try PMD block mappings if PUD mappings are not supported

Message ID 20200910133351.118191-1-alexandru.elisei@arm.com (mailing list archive)
State New, archived
Headers show
Series [v2] KVM: arm64: Try PMD block mappings if PUD mappings are not supported | expand

Commit Message

Alexandru Elisei Sept. 10, 2020, 1:33 p.m. UTC
When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to
use the same block size to map the faulting IPA in stage 2. If stage 2
cannot the same block mapping because the block size doesn't fit in the
memslot or the memslot is not properly aligned, user_mem_abort() will fall
back to a page mapping, regardless of the block size. We can do better for
PUD backed hugetlbfs by checking if a PMD block mapping is supported before
deciding to use a page.

vma_pagesize is an unsigned long, use 1UL instead of 1ULL when assigning
its value.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
Tested on a rockpro64 with 4K pages and hugetlbfs hugepagesz=1G (PUD sized
block mappings).  First test, guest RAM starts at 0x8100 0000
(memslot->base_gfn not aligned to 1GB); second test, guest RAM starts at
0x8000 0000, but is only 512 MB.  In both cases using PUD mappings is not
possible because either the memslot base address is not aligned, or the
mapping would extend beyond the memslot.

Without the changes, user_mem_abort() uses 4K pages to map the guest IPA.
With the patches, user_mem_abort() uses PMD block mappings (2MB) to map the
guest RAM, which means less TLB pressure and fewer stage 2 aborts.

Changes since v1 [1]:
- Rebased on top of Will's stage 2 page table handling rewrite, version 4
  of the series [2]. His series is missing the patch "KVM: arm64: Update
  page shift if stage 2 block mapping not supported" and there might be a
  conflict (it's straightforward to fix).

[1] https://www.spinics.net/lists/arm-kernel/msg834015.html
[2] https://www.spinics.net/lists/arm-kernel/msg835806.html

 arch/arm64/kvm/mmu.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

Comments

Punit Agrawal Sept. 11, 2020, 8:34 a.m. UTC | #1
Hi Alexandru,

Alexandru Elisei <alexandru.elisei@arm.com> writes:

> When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to
> use the same block size to map the faulting IPA in stage 2. If stage 2
> cannot the same block mapping because the block size doesn't fit in the
> memslot or the memslot is not properly aligned, user_mem_abort() will fall
> back to a page mapping, regardless of the block size. We can do better for
> PUD backed hugetlbfs by checking if a PMD block mapping is supported before
> deciding to use a page.

I think this was discussed in the past.

I have a vague recollection of there being a problem if the user and
stage 2 mappings go out of sync - can't recall the exact details.

Putting it out there in case anybody else on the thread can recall the
details of the previous discussion (offlist).

Though things may have changed and if it passes testing - then maybe I
am mis-remembering. I'll take a closer look at the patch and shout out
if I notice anything.

Thanks,
Punit

>
> vma_pagesize is an unsigned long, use 1UL instead of 1ULL when assigning
> its value.
>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
> Tested on a rockpro64 with 4K pages and hugetlbfs hugepagesz=1G (PUD sized
> block mappings).  First test, guest RAM starts at 0x8100 0000
> (memslot->base_gfn not aligned to 1GB); second test, guest RAM starts at
> 0x8000 0000, but is only 512 MB.  In both cases using PUD mappings is not
> possible because either the memslot base address is not aligned, or the
> mapping would extend beyond the memslot.
>
> Without the changes, user_mem_abort() uses 4K pages to map the guest IPA.
> With the patches, user_mem_abort() uses PMD block mappings (2MB) to map the
> guest RAM, which means less TLB pressure and fewer stage 2 aborts.
>
> Changes since v1 [1]:
> - Rebased on top of Will's stage 2 page table handling rewrite, version 4
>   of the series [2]. His series is missing the patch "KVM: arm64: Update
>   page shift if stage 2 block mapping not supported" and there might be a
>   conflict (it's straightforward to fix).
>
> [1] https://www.spinics.net/lists/arm-kernel/msg834015.html
> [2] https://www.spinics.net/lists/arm-kernel/msg835806.html
>
>  arch/arm64/kvm/mmu.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 1041be1fafe4..39c539d4d4cb 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -776,16 +776,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	else
>  		vma_shift = PAGE_SHIFT;
>  
> -	vma_pagesize = 1ULL << vma_shift;
>  	if (logging_active ||
> -	    (vma->vm_flags & VM_PFNMAP) ||
> -	    !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) {
> +	    (vma->vm_flags & VM_PFNMAP)) {
>  		force_pte = true;
> -		vma_pagesize = PAGE_SIZE;
> +		vma_shift = PAGE_SHIFT;
> +	}
> +
> +	if (vma_shift == PUD_SHIFT &&
> +	    !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
> +	       vma_shift = PMD_SHIFT;
> +
> +	if (vma_shift == PMD_SHIFT &&
> +	    !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
> +		force_pte = true;
> +		vma_shift = PAGE_SHIFT;
>  	}
>  
> +	vma_pagesize = 1UL << vma_shift;
>  	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
> -		fault_ipa &= huge_page_mask(hstate_vma(vma));
> +		fault_ipa &= ~(vma_pagesize - 1);
>  
>  	gfn = fault_ipa >> PAGE_SHIFT;
>  	mmap_read_unlock(current->mm);
Alexandru Elisei Sept. 11, 2020, 4:26 p.m. UTC | #2
Hi Punit,

Thank you for having a look!

On 9/11/20 9:34 AM, Punit Agrawal wrote:
> Hi Alexandru,
>
> Alexandru Elisei <alexandru.elisei@arm.com> writes:
>
>> When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to
>> use the same block size to map the faulting IPA in stage 2. If stage 2
>> cannot the same block mapping because the block size doesn't fit in the
>> memslot or the memslot is not properly aligned, user_mem_abort() will fall
>> back to a page mapping, regardless of the block size. We can do better for
>> PUD backed hugetlbfs by checking if a PMD block mapping is supported before
>> deciding to use a page.
> I think this was discussed in the past.
>
> I have a vague recollection of there being a problem if the user and
> stage 2 mappings go out of sync - can't recall the exact details.

I'm not sure what you mean by the two tables going out of sync. I'm looking at
Documentation/vm/unevictable-lru.rst and this is what it says regarding hugetlbfs:

"VMAs mapping hugetlbfs page are already effectively pinned into memory.  We
neither need nor want to mlock() these pages.  However, to preserve the prior
behavior of mlock() - before the unevictable/mlock changes - mlock_fixup() will
call make_pages_present() in the hugetlbfs VMA range to allocate the huge pages
and populate the ptes."

Please correct me if I'm wrong, but my interpretation is that once a hugetlbfs
page has been mapped in a process' address space, the only way to unmap it is via
munmap. If that's the case, the KVM mmu notifier should take care of unmapping
from stage 2 the entire memory range addressed by the hugetlbfs pages, right?

>
> Putting it out there in case anybody else on the thread can recall the
> details of the previous discussion (offlist).
>
> Though things may have changed and if it passes testing - then maybe I
> am mis-remembering. I'll take a closer look at the patch and shout out
> if I notice anything.

The test I ran was to boot a VM and run ltp (with printk's sprinkled in the host
kernel to see what page size and where it gets mapped/unmapped at stage 2). Do you
mind recommending other tests that I might run?

Thanks,
Alex
Punit Agrawal Sept. 15, 2020, 8:23 a.m. UTC | #3
Hi Alex,

Alexandru Elisei <alexandru.elisei@arm.com> writes:

> Hi Punit,
>
> Thank you for having a look!
>
> On 9/11/20 9:34 AM, Punit Agrawal wrote:
>> Hi Alexandru,
>>
>> Alexandru Elisei <alexandru.elisei@arm.com> writes:
>>
>>> When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to
>>> use the same block size to map the faulting IPA in stage 2. If stage 2
>>> cannot the same block mapping because the block size doesn't fit in the
>>> memslot or the memslot is not properly aligned, user_mem_abort() will fall
>>> back to a page mapping, regardless of the block size. We can do better for
>>> PUD backed hugetlbfs by checking if a PMD block mapping is supported before
>>> deciding to use a page.
>> I think this was discussed in the past.
>>
>> I have a vague recollection of there being a problem if the user and
>> stage 2 mappings go out of sync - can't recall the exact details.
>
> I'm not sure what you mean by the two tables going out of sync. I'm looking at
> Documentation/vm/unevictable-lru.rst and this is what it says regarding hugetlbfs:
>
> "VMAs mapping hugetlbfs page are already effectively pinned into memory.  We
> neither need nor want to mlock() these pages.  However, to preserve the prior
> behavior of mlock() - before the unevictable/mlock changes - mlock_fixup() will
> call make_pages_present() in the hugetlbfs VMA range to allocate the huge pages
> and populate the ptes."
>
> Please correct me if I'm wrong, but my interpretation is that once a hugetlbfs
> page has been mapped in a process' address space, the only way to unmap it is via
> munmap. If that's the case, the KVM mmu notifier should take care of unmapping
> from stage 2 the entire memory range addressed by the hugetlbfs pages,
> right?

You're right - I managed to confuse myself. Thinking about it with a bit
more context, I don't see a problem with what the patch is doing.

Apologies for the noise.

>>
>> Putting it out there in case anybody else on the thread can recall the
>> details of the previous discussion (offlist).
>>
>> Though things may have changed and if it passes testing - then maybe I
>> am mis-remembering. I'll take a closer look at the patch and shout out
>> if I notice anything.
>
> The test I ran was to boot a VM and run ltp (with printk's sprinkled in the host
> kernel to see what page size and where it gets mapped/unmapped at stage 2). Do you
> mind recommending other tests that I might run?

You may want to put the changes through VM save / restore and / or live
migration. It should help catch any issues with transitioning from
hugepages to regular pages.

Hope that helps.

Thanks,
Punit

[...]
Marc Zyngier Sept. 18, 2020, 3:56 p.m. UTC | #4
On Thu, 10 Sep 2020 14:33:51 +0100, Alexandru Elisei wrote:
> When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to
> use the same block size to map the faulting IPA in stage 2. If stage 2
> cannot the same block mapping because the block size doesn't fit in the
> memslot or the memslot is not properly aligned, user_mem_abort() will fall
> back to a page mapping, regardless of the block size. We can do better for
> PUD backed hugetlbfs by checking if a PMD block mapping is supported before
> deciding to use a page.
> 
> [...]

Applied to next, thanks!

[1/1] KVM: arm64: Try PMD block mappings if PUD mappings are not supported
      commit: 523b3999e5f620cb5ccce6a7ca2780a4cab579a2

Cheers,

	M.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1041be1fafe4..39c539d4d4cb 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -776,16 +776,25 @@  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	else
 		vma_shift = PAGE_SHIFT;
 
-	vma_pagesize = 1ULL << vma_shift;
 	if (logging_active ||
-	    (vma->vm_flags & VM_PFNMAP) ||
-	    !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) {
+	    (vma->vm_flags & VM_PFNMAP)) {
 		force_pte = true;
-		vma_pagesize = PAGE_SIZE;
+		vma_shift = PAGE_SHIFT;
+	}
+
+	if (vma_shift == PUD_SHIFT &&
+	    !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
+	       vma_shift = PMD_SHIFT;
+
+	if (vma_shift == PMD_SHIFT &&
+	    !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
+		force_pte = true;
+		vma_shift = PAGE_SHIFT;
 	}
 
+	vma_pagesize = 1UL << vma_shift;
 	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
-		fault_ipa &= huge_page_mask(hstate_vma(vma));
+		fault_ipa &= ~(vma_pagesize - 1);
 
 	gfn = fault_ipa >> PAGE_SHIFT;
 	mmap_read_unlock(current->mm);