Message ID | 1554203176-3958-1-git-send-email-suzuki.poulose@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP | expand |
On Tue, Apr 02, 2019 at 12:06:16PM +0100, Suzuki K Poulose wrote: > With commit a80868f398554842b14, we no longer ensure that the > THP page is properly aligned in the guest IPA. Skip the stage2 > huge mapping for unaligned IPA backed by transparent hugepages. > > Fixes: a80868f398554842b14 ("KVM: arm/arm64: Enforce PTE mappings at stage2 when needed") > Reported-by: Eric Auger <eric.auger@redhat.com> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Chirstoffer Dall <christoffer.dall@arm.com> > Cc: Zenghui Yu <yuzenghui@huawei.com> > Cc: Zheng Xiang <zhengxiang9@huawei.com> > Tested-by: Eric Auger <eric.auger@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > virt/kvm/arm/mmu.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index 27c9583..4a22f5b 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -1412,7 +1412,9 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) > * page accordingly. > */ > mask = PTRS_PER_PMD - 1; > - VM_BUG_ON((gfn & mask) != (pfn & mask)); > + /* Skip memslots with unaligned IPA and user address */ > + if ((gfn & mask) != (pfn & mask)) > + return false; > if (pfn & mask) { > *ipap &= PMD_MASK; > kvm_release_pfn_clean(pfn); > -- > 2.7.4 I was able to reproduce this issue on v5.1-rc3 on a SoftIron Overdrive 1000 (AMD Seattle (Rev.B1)) with defconfig+ARM64_64K_PAGES with: qemu-system-aarch64 -cpu host -machine type=virt,accel=kvm -nographic -smp 4 -m 4096 -kernel /boot/vmlinuz-4.9.0-7-arm64 --append "console=ttyAMA0 default_hugepagesz=2M hugepages=256" The 'default_hugepagesz=2M hugepages=256' had no effect on the reproducibility, however the guest only intermittently failed to boot. Applying the above patch fixed this and the guest boots every time. Tested-by: Andrew Murray <andrew.murray@arm.com> Thanks, Andrew Murray > > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
On 2019/4/2 19:06, Suzuki K Poulose wrote: > With commit a80868f398554842b14, we no longer ensure that the > THP page is properly aligned in the guest IPA. Skip the stage2 > huge mapping for unaligned IPA backed by transparent hugepages. > > Fixes: a80868f398554842b14 ("KVM: arm/arm64: Enforce PTE mappings at stage2 when needed") > Reported-by: Eric Auger <eric.auger@redhat.com> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Chirstoffer Dall <christoffer.dall@arm.com> > Cc: Zenghui Yu <yuzenghui@huawei.com> > Cc: Zheng Xiang <zhengxiang9@huawei.com> > Tested-by: Eric Auger <eric.auger@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Hi Suzuki, Why not making use of fault_supports_stage2_huge_mapping()? Let it do some checks for us. fault_supports_stage2_huge_mapping() was intended to do a *two-step* check to tell us that can we create stage2 huge block mappings, and this check is both for hugetlbfs and THP. With commit a80868f398554842b14, we pass PAGE_SIZE as "map_size" for normal size pages (which turned out to be almost meaningless), and unfortunately the THP check no longer works. So we want to rework *THP* check process. Your patch fixes the first checking-step, but the second is still missed, am I wrong? Can you please give a look at the below diff? thanks, zenghui > --- > virt/kvm/arm/mmu.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index 27c9583..4a22f5b 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -1412,7 +1412,9 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) > * page accordingly. > */ > mask = PTRS_PER_PMD - 1; > - VM_BUG_ON((gfn & mask) != (pfn & mask)); Somehow, I'd prefer keeping the VM_BUG_ON() here, let it report some potential issues in the future (of course I hope none:) ) > + /* Skip memslots with unaligned IPA and user address */ > + if ((gfn & mask) != (pfn & mask)) > + return false; > if (pfn & mask) { > *ipap &= PMD_MASK; > kvm_release_pfn_clean(pfn); > ---8>--- Rework fault_supports_stage2_huge_mapping(), let it check THP again. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> --- virt/kvm/arm/mmu.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27c9583..5e1b258 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1632,6 +1632,15 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, uaddr_end = uaddr_start + size; /* + * If the memslot is _not_ backed by hugetlbfs, then check if it + * can be backed by transparent hugepages. + * + * Currently only PMD_SIZE THPs are supported, revisit it later. + */ + if (map_size == PAGE_SIZE) + map_size = PMD_SIZE; + + /* * Pages belonging to memslots that don't have the same alignment * within a PMD/PUD for userspace and IPA cannot be mapped with stage-2 * PMD/PUD entries, because we'll end up mapping the wrong pages. @@ -1643,7 +1652,7 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, * |abcde|fgh Stage-1 block | Stage-1 block tv|xyz| * +-----+--------------------+--------------------+---+ * - * memslot->base_gfn << PAGE_SIZE: + * memslot->base_gfn << PAGE_SHIFT: * +---+--------------------+--------------------+-----+ * |abc|def Stage-2 block | Stage-2 block |tvxyz| * +---+--------------------+--------------------+-----+ --
On 04/08/2019 04:50 AM, Zenghui Yu wrote: > > > On 2019/4/2 19:06, Suzuki K Poulose wrote: >> With commit a80868f398554842b14, we no longer ensure that the >> THP page is properly aligned in the guest IPA. Skip the stage2 >> huge mapping for unaligned IPA backed by transparent hugepages. >> >> Fixes: a80868f398554842b14 ("KVM: arm/arm64: Enforce PTE mappings at >> stage2 when needed") >> Reported-by: Eric Auger <eric.auger@redhat.com> >> Cc: Marc Zyngier <marc.zyngier@arm.com> >> Cc: Chirstoffer Dall <christoffer.dall@arm.com> >> Cc: Zenghui Yu <yuzenghui@huawei.com> >> Cc: Zheng Xiang <zhengxiang9@huawei.com> >> Tested-by: Eric Auger <eric.auger@arm.com> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > > Hi Suzuki, > > Why not making use of fault_supports_stage2_huge_mapping()? Let it do > some checks for us. > > fault_supports_stage2_huge_mapping() was intended to do a *two-step* > check to tell us that can we create stage2 huge block mappings, and this > check is both for hugetlbfs and THP. With commit a80868f398554842b14, > we pass PAGE_SIZE as "map_size" for normal size pages (which turned out > to be almost meaningless), and unfortunately the THP check no longer > works. Thats correct. > > So we want to rework *THP* check process. Your patch fixes the first > checking-step, but the second is still missed, am I wrong? It fixes the step explicitly for the THP by making sure that the GPA and the HVA are aligned to the map size. > > Can you please give a look at the below diff? > > > thanks, > > zenghui > >> --- >> virt/kvm/arm/mmu.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 27c9583..4a22f5b 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1412,7 +1412,9 @@ static bool >> transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) >> * page accordingly. >> */ >> mask = PTRS_PER_PMD - 1; >> - VM_BUG_ON((gfn & mask) != (pfn & mask)); > Somehow, I'd prefer keeping the VM_BUG_ON() here, let it report some > potential issues in the future (of course I hope none:) ) I don't think this calls for a VM_BUG_ON(). It is simply a case where the GPA is not aligned to HVA, but for normal VMA that could be made THP. We had this VM_BUG_ON(), which would have never hit because we would have set force_pte if they were not aligned. >> + /* Skip memslots with unaligned IPA and user address */ >> + if ((gfn & mask) != (pfn & mask)) >> + return false; >> if (pfn & mask) { >> *ipap &= PMD_MASK; >> kvm_release_pfn_clean(pfn); >> > > ---8>--- > > Rework fault_supports_stage2_huge_mapping(), let it check THP again. > > Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> > --- > virt/kvm/arm/mmu.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index 27c9583..5e1b258 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -1632,6 +1632,15 @@ static bool > fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, > uaddr_end = uaddr_start + size; > > /* > + * If the memslot is _not_ backed by hugetlbfs, then check if it > + * can be backed by transparent hugepages. > + * > + * Currently only PMD_SIZE THPs are supported, revisit it later. > + */ > + if (map_size == PAGE_SIZE) > + map_size = PMD_SIZE; > + This looks hackish. What is we support PUD_SIZE huge page in the future ? > + /* > * Pages belonging to memslots that don't have the same alignment > * within a PMD/PUD for userspace and IPA cannot be mapped with > stage-2 > * PMD/PUD entries, because we'll end up mapping the wrong pages. > @@ -1643,7 +1652,7 @@ static bool > fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, > * |abcde|fgh Stage-1 block | Stage-1 block tv|xyz| > * +-----+--------------------+--------------------+---+ > * > - * memslot->base_gfn << PAGE_SIZE: > + * memslot->base_gfn << PAGE_SHIFT: That could be fixed. > * +---+--------------------+--------------------+-----+ > * |abc|def Stage-2 block | Stage-2 block |tvxyz| > * +---+--------------------+--------------------+-----+ But personally I don't think this is the right way to fix it. And as mentioned, the VM_BUG_ON() is unnecessary and could easily be triggered by the user by using unaligned GPA/HVA. All we need to do is, use page mapping in such cases, which we do with my patch. Cheers Suzuki
Hi Suzuki, Thanks for the reply. On 2019/4/8 18:35, Suzuki K Poulose wrote: > On 04/08/2019 04:50 AM, Zenghui Yu wrote: >> >> >> On 2019/4/2 19:06, Suzuki K Poulose wrote: >>> With commit a80868f398554842b14, we no longer ensure that the >>> THP page is properly aligned in the guest IPA. Skip the stage2 >>> huge mapping for unaligned IPA backed by transparent hugepages. >>> >>> Fixes: a80868f398554842b14 ("KVM: arm/arm64: Enforce PTE mappings at >>> stage2 when needed") >>> Reported-by: Eric Auger <eric.auger@redhat.com> >>> Cc: Marc Zyngier <marc.zyngier@arm.com> >>> Cc: Chirstoffer Dall <christoffer.dall@arm.com> >>> Cc: Zenghui Yu <yuzenghui@huawei.com> >>> Cc: Zheng Xiang <zhengxiang9@huawei.com> >>> Tested-by: Eric Auger <eric.auger@arm.com> >>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> >> Hi Suzuki, >> >> Why not making use of fault_supports_stage2_huge_mapping()? Let it do >> some checks for us. >> >> fault_supports_stage2_huge_mapping() was intended to do a *two-step* >> check to tell us that can we create stage2 huge block mappings, and this >> check is both for hugetlbfs and THP. With commit a80868f398554842b14, >> we pass PAGE_SIZE as "map_size" for normal size pages (which turned out >> to be almost meaningless), and unfortunately the THP check no longer >> works. > > Thats correct. > >> >> So we want to rework *THP* check process. Your patch fixes the first >> checking-step, but the second is still missed, am I wrong? > > It fixes the step explicitly for the THP by making sure that the GPA and > the HVA are aligned to the map size. Yes, I understand how your patch had fixed the issue. But what I'm really concerned about here is the *second* checking-step in fault_supports_stage2_huge_mapping(). We have to check if we are mapping a non-block aligned or non-block sized memslot, if so, we can not create block mappings for the beginning and end of this memslot. This is what the second part of fault_supports_stage2_huge_mapping() had done. I haven't seen this checking-step in your patch, did I miss something? >> >> Can you please give a look at the below diff? >> >> >> thanks, >> >> zenghui >> >>> --- >>> virt/kvm/arm/mmu.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>> index 27c9583..4a22f5b 100644 >>> --- a/virt/kvm/arm/mmu.c >>> +++ b/virt/kvm/arm/mmu.c >>> @@ -1412,7 +1412,9 @@ static bool >>> transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) >>> * page accordingly. >>> */ >>> mask = PTRS_PER_PMD - 1; >>> - VM_BUG_ON((gfn & mask) != (pfn & mask)); >> Somehow, I'd prefer keeping the VM_BUG_ON() here, let it report some >> potential issues in the future (of course I hope none:) ) > > I don't think this calls for a VM_BUG_ON(). It is simply a case where > the GPA is not aligned to HVA, but for normal VMA that could be made THP. > > We had this VM_BUG_ON(), which would have never hit because we would > have set force_pte if they were not aligned. Yes, I agree. >>> + /* Skip memslots with unaligned IPA and user address */ >>> + if ((gfn & mask) != (pfn & mask)) >>> + return false; >>> if (pfn & mask) { >>> *ipap &= PMD_MASK; >>> kvm_release_pfn_clean(pfn); >>> >> >> ---8>--- >> >> Rework fault_supports_stage2_huge_mapping(), let it check THP again. >> >> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> >> --- >> virt/kvm/arm/mmu.c | 11 ++++++++++- >> 1 file changed, 10 insertions(+), 1 deletion(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 27c9583..5e1b258 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1632,6 +1632,15 @@ static bool >> fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, >> uaddr_end = uaddr_start + size; >> >> /* >> + * If the memslot is _not_ backed by hugetlbfs, then check if it >> + * can be backed by transparent hugepages. >> + * >> + * Currently only PMD_SIZE THPs are supported, revisit it later. >> + */ >> + if (map_size == PAGE_SIZE) >> + map_size = PMD_SIZE; >> + > > This looks hackish. What is we support PUD_SIZE huge page in the future > ? Yes, this might make the code a little difficult to understand. But by doing so, we follow the same logic before commit a80868f398554842b14, that said, we do the two-step checking for normal size pages in fault_supports_stage2_huge_mapping(), to decide if we can create THP mappings for these pages. As for PUD_SIZE THPs, to be honest, I have no idea now :( thanks, zenghui >> + /* >> * Pages belonging to memslots that don't have the same alignment >> * within a PMD/PUD for userspace and IPA cannot be mapped with >> stage-2 >> * PMD/PUD entries, because we'll end up mapping the wrong pages. >> @@ -1643,7 +1652,7 @@ static bool >> fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, >> * |abcde|fgh Stage-1 block | Stage-1 block tv|xyz| >> * +-----+--------------------+--------------------+---+ >> * >> - * memslot->base_gfn << PAGE_SIZE: >> + * memslot->base_gfn << PAGE_SHIFT: > > That could be fixed. > >> * +---+--------------------+--------------------+-----+ >> * |abc|def Stage-2 block | Stage-2 block |tvxyz| >> * +---+--------------------+--------------------+-----+ > > > But personally I don't think this is the right way to fix it. And as > mentioned, the VM_BUG_ON() is unnecessary and could easily be triggered > by the user by using unaligned GPA/HVA. All we need to do is, use page > mapping in such cases, which we do with my patch. > > > Cheers > Suzuki > > .
Hi Zenhui, On 04/08/2019 04:11 PM, Zenghui Yu wrote: > Hi Suzuki, > > Thanks for the reply. > ... >>> Hi Suzuki, >>> >>> Why not making use of fault_supports_stage2_huge_mapping()? Let it do >>> some checks for us. >>> >>> fault_supports_stage2_huge_mapping() was intended to do a *two-step* >>> check to tell us that can we create stage2 huge block mappings, and this >>> check is both for hugetlbfs and THP. With commit a80868f398554842b14, >>> we pass PAGE_SIZE as "map_size" for normal size pages (which turned out >>> to be almost meaningless), and unfortunately the THP check no longer >>> works. >> >> Thats correct. >> >>> >>> So we want to rework *THP* check process. Your patch fixes the first >>> checking-step, but the second is still missed, am I wrong? >> >> It fixes the step explicitly for the THP by making sure that the GPA and >> the HVA are aligned to the map size. > > Yes, I understand how your patch had fixed the issue. But what I'm > really concerned about here is the *second* checking-step in > fault_supports_stage2_huge_mapping(). > > We have to check if we are mapping a non-block aligned or non-block > sized memslot, if so, we can not create block mappings for the beginning > and end of this memslot. This is what the second part of > fault_supports_stage2_huge_mapping() had done. > > I haven't seen this checking-step in your patch, did I miss something? > I see. >> I don't think this calls for a VM_BUG_ON(). It is simply a case where >> the GPA is not aligned to HVA, but for normal VMA that could be made THP. >> >> We had this VM_BUG_ON(), which would have never hit because we would >> have set force_pte if they were not aligned. > > Yes, I agree. > >>>> + /* Skip memslots with unaligned IPA and user address */ >>>> + if ((gfn & mask) != (pfn & mask)) >>>> + return false; >>>> if (pfn & mask) { >>>> *ipap &= PMD_MASK; >>>> kvm_release_pfn_clean(pfn); >>>> >>> >>> ---8>--- >>> >>> Rework fault_supports_stage2_huge_mapping(), let it check THP again. >>> >>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> >>> --- >>> virt/kvm/arm/mmu.c | 11 ++++++++++- >>> 1 file changed, 10 insertions(+), 1 deletion(-) >>> >>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>> index 27c9583..5e1b258 100644 >>> --- a/virt/kvm/arm/mmu.c >>> +++ b/virt/kvm/arm/mmu.c >>> @@ -1632,6 +1632,15 @@ static bool >>> fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, >>> uaddr_end = uaddr_start + size; >>> >>> /* >>> + * If the memslot is _not_ backed by hugetlbfs, then check if it >>> + * can be backed by transparent hugepages. >>> + * >>> + * Currently only PMD_SIZE THPs are supported, revisit it later. >>> + */ >>> + if (map_size == PAGE_SIZE) >>> + map_size = PMD_SIZE; >>> + >> >> This looks hackish. What is we support PUD_SIZE huge page in the future >> ? > > Yes, this might make the code a little difficult to understand. But by > doing so, we follow the same logic before commit a80868f398554842b14, > that said, we do the two-step checking for normal size pages in > fault_supports_stage2_huge_mapping(), to decide if we can create THP > mappings for these pages. > > As for PUD_SIZE THPs, to be honest, I have no idea now :( How about the following diff ? diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 97b5417..98e5cec 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1791,7 +1791,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * currently supported. This code will need to be * updated to support other THP sizes. */ - if (transparent_hugepage_adjust(&pfn, &fault_ipa)) + if (fault_supports_stage2_huge_mappings(memslot, hva, PMD_SIZE) && + transparent_hugepage_adjust(&pfn, &fault_ipa)) vma_pagesize = PMD_SIZE; }
On 2019/4/9 2:40, Suzuki K Poulose wrote: > Hi Zenhui, > > On 04/08/2019 04:11 PM, Zenghui Yu wrote: >> Hi Suzuki, >> >> Thanks for the reply. >> > > ... > >>>> Hi Suzuki, >>>> >>>> Why not making use of fault_supports_stage2_huge_mapping()? Let it do >>>> some checks for us. >>>> >>>> fault_supports_stage2_huge_mapping() was intended to do a *two-step* >>>> check to tell us that can we create stage2 huge block mappings, and >>>> this >>>> check is both for hugetlbfs and THP. With commit a80868f398554842b14, >>>> we pass PAGE_SIZE as "map_size" for normal size pages (which turned out >>>> to be almost meaningless), and unfortunately the THP check no longer >>>> works. >>> >>> Thats correct. >>> >>>> >>>> So we want to rework *THP* check process. Your patch fixes the first >>>> checking-step, but the second is still missed, am I wrong? >>> >>> It fixes the step explicitly for the THP by making sure that the GPA and >>> the HVA are aligned to the map size. >> >> Yes, I understand how your patch had fixed the issue. But what I'm >> really concerned about here is the *second* checking-step in >> fault_supports_stage2_huge_mapping(). >> >> We have to check if we are mapping a non-block aligned or non-block >> sized memslot, if so, we can not create block mappings for the beginning >> and end of this memslot. This is what the second part of >> fault_supports_stage2_huge_mapping() had done. >> >> I haven't seen this checking-step in your patch, did I miss something? >> > > I see. > >>> I don't think this calls for a VM_BUG_ON(). It is simply a case where >>> the GPA is not aligned to HVA, but for normal VMA that could be made >>> THP. >>> >>> We had this VM_BUG_ON(), which would have never hit because we would >>> have set force_pte if they were not aligned. >> >> Yes, I agree. >> >>>>> + /* Skip memslots with unaligned IPA and user address */ >>>>> + if ((gfn & mask) != (pfn & mask)) >>>>> + return false; >>>>> if (pfn & mask) { >>>>> *ipap &= PMD_MASK; >>>>> kvm_release_pfn_clean(pfn); >>>>> >>>> >>>> ---8>--- >>>> >>>> Rework fault_supports_stage2_huge_mapping(), let it check THP again. >>>> >>>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> >>>> --- >>>> virt/kvm/arm/mmu.c | 11 ++++++++++- >>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>>> index 27c9583..5e1b258 100644 >>>> --- a/virt/kvm/arm/mmu.c >>>> +++ b/virt/kvm/arm/mmu.c >>>> @@ -1632,6 +1632,15 @@ static bool >>>> fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, >>>> uaddr_end = uaddr_start + size; >>>> >>>> /* >>>> + * If the memslot is _not_ backed by hugetlbfs, then check if it >>>> + * can be backed by transparent hugepages. >>>> + * >>>> + * Currently only PMD_SIZE THPs are supported, revisit it later. >>>> + */ >>>> + if (map_size == PAGE_SIZE) >>>> + map_size = PMD_SIZE; >>>> + >>> >>> This looks hackish. What is we support PUD_SIZE huge page in the future >>> ? >> >> Yes, this might make the code a little difficult to understand. But by >> doing so, we follow the same logic before commit a80868f398554842b14, >> that said, we do the two-step checking for normal size pages in >> fault_supports_stage2_huge_mapping(), to decide if we can create THP >> mappings for these pages. >> >> As for PUD_SIZE THPs, to be honest, I have no idea now :( > > How about the following diff ? > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index 97b5417..98e5cec 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -1791,7 +1791,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > * currently supported. This code will need to be > * updated to support other THP sizes. > */ > - if (transparent_hugepage_adjust(&pfn, &fault_ipa)) > + if (fault_supports_stage2_huge_mappings(memslot, hva, PMD_SIZE) && > + transparent_hugepage_adjust(&pfn, &fault_ipa)) > vma_pagesize = PMD_SIZE; > } I think this is good enough for the issue. (One minor concern: With this change, it seems that we no longer need "force_pte" and can just use "logging_active" instead. But this is not much related to what we're fixing.) thanks.
Hi Zenghui On 04/09/2019 09:05 AM, Zenghui Yu wrote: > > > On 2019/4/9 2:40, Suzuki K Poulose wrote: >> Hi Zenhui, >> >> On 04/08/2019 04:11 PM, Zenghui Yu wrote: >>> Hi Suzuki, >>> >>> Thanks for the reply. >>> >> >> ... >> >>>>> Hi Suzuki, >>>>> >>>>> Why not making use of fault_supports_stage2_huge_mapping()? Let it do >>>>> some checks for us. >>>>> >>>>> fault_supports_stage2_huge_mapping() was intended to do a *two-step* >>>>> check to tell us that can we create stage2 huge block mappings, and >>>>> this >>>>> check is both for hugetlbfs and THP. With commit a80868f398554842b14, >>>>> we pass PAGE_SIZE as "map_size" for normal size pages (which turned >>>>> out >>>>> to be almost meaningless), and unfortunately the THP check no longer >>>>> works. >>>> >>>> Thats correct. >>>> >>>>> >>>>> So we want to rework *THP* check process. Your patch fixes the first >>>>> checking-step, but the second is still missed, am I wrong? >>>> >>>> It fixes the step explicitly for the THP by making sure that the GPA >>>> and >>>> the HVA are aligned to the map size. >>> >>> Yes, I understand how your patch had fixed the issue. But what I'm >>> really concerned about here is the *second* checking-step in >>> fault_supports_stage2_huge_mapping(). >>> >>> We have to check if we are mapping a non-block aligned or non-block >>> sized memslot, if so, we can not create block mappings for the beginning >>> and end of this memslot. This is what the second part of >>> fault_supports_stage2_huge_mapping() had done. >>> >>> I haven't seen this checking-step in your patch, did I miss something? >>> >> >> I see. >> >>>> I don't think this calls for a VM_BUG_ON(). It is simply a case where >>>> the GPA is not aligned to HVA, but for normal VMA that could be made >>>> THP. >>>> >>>> We had this VM_BUG_ON(), which would have never hit because we would >>>> have set force_pte if they were not aligned. >>> >>> Yes, I agree. >>> >>>>>> + /* Skip memslots with unaligned IPA and user address */ >>>>>> + if ((gfn & mask) != (pfn & mask)) >>>>>> + return false; >>>>>> if (pfn & mask) { >>>>>> *ipap &= PMD_MASK; >>>>>> kvm_release_pfn_clean(pfn); >>>>>> >>>>> >>>>> ---8>--- >>>>> >>>>> Rework fault_supports_stage2_huge_mapping(), let it check THP again. >>>>> >>>>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> >>>>> --- >>>>> virt/kvm/arm/mmu.c | 11 ++++++++++- >>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>>>> index 27c9583..5e1b258 100644 >>>>> --- a/virt/kvm/arm/mmu.c >>>>> +++ b/virt/kvm/arm/mmu.c >>>>> @@ -1632,6 +1632,15 @@ static bool >>>>> fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, >>>>> uaddr_end = uaddr_start + size; >>>>> >>>>> /* >>>>> + * If the memslot is _not_ backed by hugetlbfs, then check if it >>>>> + * can be backed by transparent hugepages. >>>>> + * >>>>> + * Currently only PMD_SIZE THPs are supported, revisit it later. >>>>> + */ >>>>> + if (map_size == PAGE_SIZE) >>>>> + map_size = PMD_SIZE; >>>>> + >>>> >>>> This looks hackish. What is we support PUD_SIZE huge page in the future >>>> ? >>> >>> Yes, this might make the code a little difficult to understand. But by >>> doing so, we follow the same logic before commit a80868f398554842b14, >>> that said, we do the two-step checking for normal size pages in >>> fault_supports_stage2_huge_mapping(), to decide if we can create THP >>> mappings for these pages. >>> >>> As for PUD_SIZE THPs, to be honest, I have no idea now :( >> >> How about the following diff ? >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 97b5417..98e5cec 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1791,7 +1791,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> * currently supported. This code will need to be >> * updated to support other THP sizes. >> */ >> - if (transparent_hugepage_adjust(&pfn, &fault_ipa)) >> + if (fault_supports_stage2_huge_mappings(memslot, hva, >> PMD_SIZE) && >> + transparent_hugepage_adjust(&pfn, &fault_ipa)) >> vma_pagesize = PMD_SIZE; >> } > > I think this is good enough for the issue. > > (One minor concern: With this change, it seems that we no longer need > "force_pte" and can just use "logging_active" instead. But this is not > much related to what we're fixing.) I would still leave the force_pte there to avoid checking for a THP case in a situation where we forced to PTE level mapping on a hugepage backed VMA. It would serve to avoid another check. Cheers Suzuki > > > thanks. > >
On 2019/4/9 22:59, Suzuki K Poulose wrote: > Hi Zenghui > > On 04/09/2019 09:05 AM, Zenghui Yu wrote: >> >> >> On 2019/4/9 2:40, Suzuki K Poulose wrote: >>> Hi Zenhui, >>> >>> On 04/08/2019 04:11 PM, Zenghui Yu wrote: >>>> Hi Suzuki, >>>> >>>> Thanks for the reply. >>>> >>> >>> ... >>> >>>>>> Hi Suzuki, >>>>>> >>>>>> Why not making use of fault_supports_stage2_huge_mapping()? Let >>>>>> it do >>>>>> some checks for us. >>>>>> >>>>>> fault_supports_stage2_huge_mapping() was intended to do a *two-step* >>>>>> check to tell us that can we create stage2 huge block mappings, >>>>>> and this >>>>>> check is both for hugetlbfs and THP. With commit >>>>>> a80868f398554842b14, >>>>>> we pass PAGE_SIZE as "map_size" for normal size pages (which >>>>>> turned out >>>>>> to be almost meaningless), and unfortunately the THP check no longer >>>>>> works. >>>>> >>>>> Thats correct. >>>>> >>>>>> >>>>>> So we want to rework *THP* check process. Your patch fixes the first >>>>>> checking-step, but the second is still missed, am I wrong? >>>>> >>>>> It fixes the step explicitly for the THP by making sure that the >>>>> GPA and >>>>> the HVA are aligned to the map size. >>>> >>>> Yes, I understand how your patch had fixed the issue. But what I'm >>>> really concerned about here is the *second* checking-step in >>>> fault_supports_stage2_huge_mapping(). >>>> >>>> We have to check if we are mapping a non-block aligned or non-block >>>> sized memslot, if so, we can not create block mappings for the >>>> beginning >>>> and end of this memslot. This is what the second part of >>>> fault_supports_stage2_huge_mapping() had done. >>>> >>>> I haven't seen this checking-step in your patch, did I miss something? >>>> >>> >>> I see. >>> >>>>> I don't think this calls for a VM_BUG_ON(). It is simply a case where >>>>> the GPA is not aligned to HVA, but for normal VMA that could be >>>>> made THP. >>>>> >>>>> We had this VM_BUG_ON(), which would have never hit because we would >>>>> have set force_pte if they were not aligned. >>>> >>>> Yes, I agree. >>>> >>>>>>> + /* Skip memslots with unaligned IPA and user address */ >>>>>>> + if ((gfn & mask) != (pfn & mask)) >>>>>>> + return false; >>>>>>> if (pfn & mask) { >>>>>>> *ipap &= PMD_MASK; >>>>>>> kvm_release_pfn_clean(pfn); >>>>>>> >>>>>> >>>>>> ---8>--- >>>>>> >>>>>> Rework fault_supports_stage2_huge_mapping(), let it check THP again. >>>>>> >>>>>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> >>>>>> --- >>>>>> virt/kvm/arm/mmu.c | 11 ++++++++++- >>>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>>>>> index 27c9583..5e1b258 100644 >>>>>> --- a/virt/kvm/arm/mmu.c >>>>>> +++ b/virt/kvm/arm/mmu.c >>>>>> @@ -1632,6 +1632,15 @@ static bool >>>>>> fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, >>>>>> uaddr_end = uaddr_start + size; >>>>>> >>>>>> /* >>>>>> + * If the memslot is _not_ backed by hugetlbfs, then check if it >>>>>> + * can be backed by transparent hugepages. >>>>>> + * >>>>>> + * Currently only PMD_SIZE THPs are supported, revisit it later. >>>>>> + */ >>>>>> + if (map_size == PAGE_SIZE) >>>>>> + map_size = PMD_SIZE; >>>>>> + >>>>> >>>>> This looks hackish. What is we support PUD_SIZE huge page in the >>>>> future >>>>> ? >>>> >>>> Yes, this might make the code a little difficult to understand. But by >>>> doing so, we follow the same logic before commit a80868f398554842b14, >>>> that said, we do the two-step checking for normal size pages in >>>> fault_supports_stage2_huge_mapping(), to decide if we can create THP >>>> mappings for these pages. >>>> >>>> As for PUD_SIZE THPs, to be honest, I have no idea now :( >>> >>> How about the following diff ? >>> >>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>> index 97b5417..98e5cec 100644 >>> --- a/virt/kvm/arm/mmu.c >>> +++ b/virt/kvm/arm/mmu.c >>> @@ -1791,7 +1791,8 @@ static int user_mem_abort(struct kvm_vcpu >>> *vcpu, phys_addr_t fault_ipa, >>> * currently supported. This code will need to be >>> * updated to support other THP sizes. >>> */ >>> - if (transparent_hugepage_adjust(&pfn, &fault_ipa)) >>> + if (fault_supports_stage2_huge_mappings(memslot, hva, >>> PMD_SIZE) && >>> + transparent_hugepage_adjust(&pfn, &fault_ipa)) >>> vma_pagesize = PMD_SIZE; >>> } >> >> I think this is good enough for the issue. >> >> (One minor concern: With this change, it seems that we no longer need >> "force_pte" and can just use "logging_active" instead. But this is not >> much related to what we're fixing.) > > I would still leave the force_pte there to avoid checking for a THP case > in a situation where we forced to PTE level mapping on a hugepage backed > VMA. It would serve to avoid another check. Hi Suzuki, Yes, I agree, thanks. zenghui
On 10/04/2019 03:20, Zenghui Yu wrote: > > On 2019/4/9 22:59, Suzuki K Poulose wrote: >> Hi Zenghui >> >> On 04/09/2019 09:05 AM, Zenghui Yu wrote: >>> >>> >>> On 2019/4/9 2:40, Suzuki K Poulose wrote: >>>> Hi Zenhui, >>>> >>>> On 04/08/2019 04:11 PM, Zenghui Yu wrote: >>>>> Hi Suzuki, >>>>> >>>>> Thanks for the reply. >>>>> >>>> >>>> ... >>>> >>>>>>> Hi Suzuki, >>>>>>> >>>>>>> Why not making use of fault_supports_stage2_huge_mapping()? Let >>>>>>> it do >>>>>>> some checks for us. >>>>>>> >>>>>>> fault_supports_stage2_huge_mapping() was intended to do a *two-step* >>>>>>> check to tell us that can we create stage2 huge block mappings, >>>>>>> and this >>>>>>> check is both for hugetlbfs and THP. With commit >>>>>>> a80868f398554842b14, >>>>>>> we pass PAGE_SIZE as "map_size" for normal size pages (which >>>>>>> turned out >>>>>>> to be almost meaningless), and unfortunately the THP check no longer >>>>>>> works. >>>>>> >>>>>> Thats correct. >>>>>> >>>>>>> >>>>>>> So we want to rework *THP* check process. Your patch fixes the first >>>>>>> checking-step, but the second is still missed, am I wrong? >>>>>> >>>>>> It fixes the step explicitly for the THP by making sure that the >>>>>> GPA and >>>>>> the HVA are aligned to the map size. >>>>> >>>>> Yes, I understand how your patch had fixed the issue. But what I'm >>>>> really concerned about here is the *second* checking-step in >>>>> fault_supports_stage2_huge_mapping(). >>>>> >>>>> We have to check if we are mapping a non-block aligned or non-block >>>>> sized memslot, if so, we can not create block mappings for the >>>>> beginning >>>>> and end of this memslot. This is what the second part of >>>>> fault_supports_stage2_huge_mapping() had done. >>>>> >>>>> I haven't seen this checking-step in your patch, did I miss something? >>>>> >>>> >>>> I see. >>>> >>>>>> I don't think this calls for a VM_BUG_ON(). It is simply a case where >>>>>> the GPA is not aligned to HVA, but for normal VMA that could be >>>>>> made THP. >>>>>> >>>>>> We had this VM_BUG_ON(), which would have never hit because we would >>>>>> have set force_pte if they were not aligned. >>>>> >>>>> Yes, I agree. >>>>> >>>>>>>> + /* Skip memslots with unaligned IPA and user address */ >>>>>>>> + if ((gfn & mask) != (pfn & mask)) >>>>>>>> + return false; >>>>>>>> if (pfn & mask) { >>>>>>>> *ipap &= PMD_MASK; >>>>>>>> kvm_release_pfn_clean(pfn); >>>>>>>> >>>>>>> >>>>>>> ---8>--- >>>>>>> >>>>>>> Rework fault_supports_stage2_huge_mapping(), let it check THP again. >>>>>>> >>>>>>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> >>>>>>> --- >>>>>>> virt/kvm/arm/mmu.c | 11 ++++++++++- >>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>>>>>> index 27c9583..5e1b258 100644 >>>>>>> --- a/virt/kvm/arm/mmu.c >>>>>>> +++ b/virt/kvm/arm/mmu.c >>>>>>> @@ -1632,6 +1632,15 @@ static bool >>>>>>> fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, >>>>>>> uaddr_end = uaddr_start + size; >>>>>>> >>>>>>> /* >>>>>>> + * If the memslot is _not_ backed by hugetlbfs, then check if it >>>>>>> + * can be backed by transparent hugepages. >>>>>>> + * >>>>>>> + * Currently only PMD_SIZE THPs are supported, revisit it later. >>>>>>> + */ >>>>>>> + if (map_size == PAGE_SIZE) >>>>>>> + map_size = PMD_SIZE; >>>>>>> + >>>>>> >>>>>> This looks hackish. What is we support PUD_SIZE huge page in the >>>>>> future >>>>>> ? >>>>> >>>>> Yes, this might make the code a little difficult to understand. But by >>>>> doing so, we follow the same logic before commit a80868f398554842b14, >>>>> that said, we do the two-step checking for normal size pages in >>>>> fault_supports_stage2_huge_mapping(), to decide if we can create THP >>>>> mappings for these pages. >>>>> >>>>> As for PUD_SIZE THPs, to be honest, I have no idea now :( >>>> >>>> How about the following diff ? >>>> >>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>>> index 97b5417..98e5cec 100644 >>>> --- a/virt/kvm/arm/mmu.c >>>> +++ b/virt/kvm/arm/mmu.c >>>> @@ -1791,7 +1791,8 @@ static int user_mem_abort(struct kvm_vcpu >>>> *vcpu, phys_addr_t fault_ipa, >>>> * currently supported. This code will need to be >>>> * updated to support other THP sizes. >>>> */ >>>> - if (transparent_hugepage_adjust(&pfn, &fault_ipa)) >>>> + if (fault_supports_stage2_huge_mappings(memslot, hva, >>>> PMD_SIZE) && >>>> + transparent_hugepage_adjust(&pfn, &fault_ipa)) >>>> vma_pagesize = PMD_SIZE; >>>> } >>> >>> I think this is good enough for the issue. >>> >>> (One minor concern: With this change, it seems that we no longer need >>> "force_pte" and can just use "logging_active" instead. But this is not >>> much related to what we're fixing.) >> >> I would still leave the force_pte there to avoid checking for a THP case >> in a situation where we forced to PTE level mapping on a hugepage backed >> VMA. It would serve to avoid another check. > > Hi Suzuki, > > Yes, I agree, thanks. Cool, I have a patch to fix this properly and two other patches to clean up and unify the way we handle the THP backed hugepages. Will send them out after a bit of testing, later today. Cheers Suzuki
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27c9583..4a22f5b 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1412,7 +1412,9 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) * page accordingly. */ mask = PTRS_PER_PMD - 1; - VM_BUG_ON((gfn & mask) != (pfn & mask)); + /* Skip memslots with unaligned IPA and user address */ + if ((gfn & mask) != (pfn & mask)) + return false; if (pfn & mask) { *ipap &= PMD_MASK; kvm_release_pfn_clean(pfn);