Message ID | 20220321002638.379672-4-mizhang@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Verify dirty logging works properly with page stats | expand |
On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote: > > Add extra check to specify the case of nx hugepage and allow KVM to > reconstruct large mapping after dirty logging is disabled. Existing code > works only for nx hugepage but the condition is too general in that does > not consider other usage case (such as dirty logging). Moreover, existing > code assumes that a present PMD or PUD indicates that there exist 'smaller > SPTEs' under the paging structure. This assumption may no be true if > consider the zapping leafs only behavior in MMU. > > Missing the check causes KVM incorrectly regards the faulting page as a NX > huge page and refuse to map it at desired level. And this leads to back > performance in shadow mmu and potentiall TDP mmu. > > Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") > Cc: stable@vger.kernel.org > > Reviewed-by: Ben Gardon <bgardon@google.com> > Signed-off-by: Mingwei Zhang <mizhang@google.com> > --- > arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 5628d0ba637e..4d358c273f6c 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ > cur_level == fault->goal_level && > is_shadow_present_pte(spte) && > !is_large_pte(spte)) { > + struct kvm_mmu_page *sp; > + u64 page_mask; > + /* > + * When nx hugepage flag is not set, there is no reason to > + * go down to another level. This helps demand paging to > + * generate large mappings. > + */ This comment is relevant to Google's internal demand paging scheme, but isn't really relevant to UFFD demand paging. Still, as demonstrated by the next commit, this is important for dirty loggin, so I'd suggest updating this comment to refer to that instead. > + sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK); > + if (!sp->lpage_disallowed) > + return; > /* > * A small SPTE exists for this pfn, but FNAME(fetch) > * and __direct_map would like to create a large PTE > @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ > * patching back for them into pfn the next 9 bits of > * the address. > */ > - u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) - > - KVM_PAGES_PER_HPAGE(cur_level - 1); > + page_mask = KVM_PAGES_PER_HPAGE(cur_level) - > + KVM_PAGES_PER_HPAGE(cur_level - 1); > fault->pfn |= fault->gfn & page_mask; > fault->goal_level--; > } > -- > 2.35.1.894.gb6a874cedc-goog >
On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote: > > Add extra check to specify the case of nx hugepage and allow KVM to > reconstruct large mapping after dirty logging is disabled. Existing code > works only for nx hugepage but the condition is too general in that does > not consider other usage case (such as dirty logging). KVM calls kvm_mmu_zap_collapsible_sptes() when dirty logging is disabled. Why is that not sufficient? > Moreover, existing > code assumes that a present PMD or PUD indicates that there exist 'smaller > SPTEs' under the paging structure. This assumption may no be true if > consider the zapping leafs only behavior in MMU. Good point. Although, that code just got reverted. Maybe say something like: This assumption may not be true in the future if KVM gains support for zapping only leaf SPTEs. > > Missing the check causes KVM incorrectly regards the faulting page as a NX > huge page and refuse to map it at desired level. And this leads to back > performance in shadow mmu and potentiall TDP mmu. s/potentiall/potentially/ > > Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") > Cc: stable@vger.kernel.org > > Reviewed-by: Ben Gardon <bgardon@google.com> > Signed-off-by: Mingwei Zhang <mizhang@google.com> > --- > arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 5628d0ba637e..4d358c273f6c 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ > cur_level == fault->goal_level && > is_shadow_present_pte(spte) && > !is_large_pte(spte)) { > + struct kvm_mmu_page *sp; > + u64 page_mask; > + /* > + * When nx hugepage flag is not set, there is no reason to > + * go down to another level. This helps demand paging to > + * generate large mappings. > + */ > + sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK); > + if (!sp->lpage_disallowed) > + return; > /* > * A small SPTE exists for this pfn, but FNAME(fetch) > * and __direct_map would like to create a large PTE > @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ > * patching back for them into pfn the next 9 bits of > * the address. > */ > - u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) - > - KVM_PAGES_PER_HPAGE(cur_level - 1); > + page_mask = KVM_PAGES_PER_HPAGE(cur_level) - > + KVM_PAGES_PER_HPAGE(cur_level - 1); > fault->pfn |= fault->gfn & page_mask; > fault->goal_level--; > } > -- > 2.35.1.894.gb6a874cedc-goog >
On Mon, Mar 21, 2022 at 3:00 PM David Matlack <dmatlack@google.com> wrote: > > On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote: > > > > Add extra check to specify the case of nx hugepage and allow KVM to > > reconstruct large mapping after dirty logging is disabled. Existing code > > works only for nx hugepage but the condition is too general in that does > > not consider other usage case (such as dirty logging). > > KVM calls kvm_mmu_zap_collapsible_sptes() when dirty logging is > disabled. Why is that not sufficient? Ahh I see, kvm_mmu_zap_collapsible_sptes() only zaps the leaf SPTEs. Could you add a blurb about this in the commit message for future reference? > > > Moreover, existing > > code assumes that a present PMD or PUD indicates that there exist 'smaller > > SPTEs' under the paging structure. This assumption may no be true if > > consider the zapping leafs only behavior in MMU. > > Good point. Although, that code just got reverted. Maybe say something like: > > This assumption may not be true in the future if KVM gains support > for zapping only leaf SPTEs. Nevermind, support for zapping leaf SPTEs already exists for zapping collapsible SPTEs. > > > > > Missing the check causes KVM incorrectly regards the faulting page as a NX > > huge page and refuse to map it at desired level. And this leads to back > > performance in shadow mmu and potentiall TDP mmu. > > s/potentiall/potentially/ > > > > > Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") > > Cc: stable@vger.kernel.org > > > > Reviewed-by: Ben Gardon <bgardon@google.com> > > Signed-off-by: Mingwei Zhang <mizhang@google.com> > > --- > > arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++-- > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 5628d0ba637e..4d358c273f6c 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ > > cur_level == fault->goal_level && > > is_shadow_present_pte(spte) && > > !is_large_pte(spte)) { > > + struct kvm_mmu_page *sp; > > + u64 page_mask; > > + /* > > + * When nx hugepage flag is not set, there is no reason to > > + * go down to another level. This helps demand paging to > > + * generate large mappings. > > + */ > > + sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK); > > + if (!sp->lpage_disallowed) > > + return; > > /* > > * A small SPTE exists for this pfn, but FNAME(fetch) > > * and __direct_map would like to create a large PTE > > @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ > > * patching back for them into pfn the next 9 bits of > > * the address. > > */ > > - u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) - > > - KVM_PAGES_PER_HPAGE(cur_level - 1); > > + page_mask = KVM_PAGES_PER_HPAGE(cur_level) - > > + KVM_PAGES_PER_HPAGE(cur_level - 1); > > fault->pfn |= fault->gfn & page_mask; > > fault->goal_level--; > > } > > -- > > 2.35.1.894.gb6a874cedc-goog > >
On Mon, Mar 21, 2022, Ben Gardon wrote: > On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote: > > > > Add extra check to specify the case of nx hugepage and allow KVM to > > reconstruct large mapping after dirty logging is disabled. Existing code > > works only for nx hugepage but the condition is too general in that does > > not consider other usage case (such as dirty logging). Moreover, existing > > code assumes that a present PMD or PUD indicates that there exist 'smaller > > SPTEs' under the paging structure. This assumption may no be true if > > consider the zapping leafs only behavior in MMU. > > > > Missing the check causes KVM incorrectly regards the faulting page as a NX > > huge page and refuse to map it at desired level. And this leads to back > > performance in shadow mmu and potentiall TDP mmu. > > > > Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") > > Cc: stable@vger.kernel.org > > > > Reviewed-by: Ben Gardon <bgardon@google.com> > > Signed-off-by: Mingwei Zhang <mizhang@google.com> > > --- > > arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++-- > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 5628d0ba637e..4d358c273f6c 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ > > cur_level == fault->goal_level && > > is_shadow_present_pte(spte) && > > !is_large_pte(spte)) { > > + struct kvm_mmu_page *sp; > > + u64 page_mask; > > + /* > > + * When nx hugepage flag is not set, there is no reason to > > + * go down to another level. This helps demand paging to > > + * generate large mappings. > > + */ > > This comment is relevant to Google's internal demand paging scheme, > but isn't really relevant to UFFD demand paging. > Still, as demonstrated by the next commit, this is important for dirty > loggin, so I'd suggest updating this comment to refer to that instead. > Ah, leaking my true motivation :-) Definitely will update the comment. > > + sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK); > > + if (!sp->lpage_disallowed) > > + return; > > /* > > * A small SPTE exists for this pfn, but FNAME(fetch) > > * and __direct_map would like to create a large PTE > > @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ > > * patching back for them into pfn the next 9 bits of > > * the address. > > */ > > - u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) - > > - KVM_PAGES_PER_HPAGE(cur_level - 1); > > + page_mask = KVM_PAGES_PER_HPAGE(cur_level) - > > + KVM_PAGES_PER_HPAGE(cur_level - 1); > > fault->pfn |= fault->gfn & page_mask; > > fault->goal_level--; > > } > > -- > > 2.35.1.894.gb6a874cedc-goog > >
On Mon, Mar 21, 2022, David Matlack wrote: > On Mon, Mar 21, 2022 at 3:00 PM David Matlack <dmatlack@google.com> wrote: > > > > On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <mizhang@google.com> wrote: > > > > > > Add extra check to specify the case of nx hugepage and allow KVM to > > > reconstruct large mapping after dirty logging is disabled. Existing code > > > works only for nx hugepage but the condition is too general in that does > > > not consider other usage case (such as dirty logging). > > > > KVM calls kvm_mmu_zap_collapsible_sptes() when dirty logging is > > disabled. Why is that not sufficient? > > Ahh I see, kvm_mmu_zap_collapsible_sptes() only zaps the leaf SPTEs. > Could you add a blurb about this in the commit message for future > reference? > will do. > > > > > Moreover, existing > > > code assumes that a present PMD or PUD indicates that there exist 'smaller > > > SPTEs' under the paging structure. This assumption may no be true if > > > consider the zapping leafs only behavior in MMU. > > > > Good point. Although, that code just got reverted. Maybe say something like: > > > > This assumption may not be true in the future if KVM gains support > > for zapping only leaf SPTEs. > > Nevermind, support for zapping leaf SPTEs already exists for zapping > collapsible SPTEs. > > > > > > > > > > > Missing the check causes KVM incorrectly regards the faulting page as a NX > > > huge page and refuse to map it at desired level. And this leads to back > > > performance in shadow mmu and potentiall TDP mmu. > > > > s/potentiall/potentially/ Thanks for that. > > > > > > > > Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") > > > Cc: stable@vger.kernel.org > > > > > > Reviewed-by: Ben Gardon <bgardon@google.com> > > > Signed-off-by: Mingwei Zhang <mizhang@google.com> > > > --- > > > arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++-- > > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index 5628d0ba637e..4d358c273f6c 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ > > > cur_level == fault->goal_level && > > > is_shadow_present_pte(spte) && > > > !is_large_pte(spte)) { > > > + struct kvm_mmu_page *sp; > > > + u64 page_mask; > > > + /* > > > + * When nx hugepage flag is not set, there is no reason to > > > + * go down to another level. This helps demand paging to > > > + * generate large mappings. > > > + */ > > > + sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK); > > > + if (!sp->lpage_disallowed) > > > + return; > > > /* > > > * A small SPTE exists for this pfn, but FNAME(fetch) > > > * and __direct_map would like to create a large PTE > > > @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ > > > * patching back for them into pfn the next 9 bits of > > > * the address. > > > */ > > > - u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) - > > > - KVM_PAGES_PER_HPAGE(cur_level - 1); > > > + page_mask = KVM_PAGES_PER_HPAGE(cur_level) - > > > + KVM_PAGES_PER_HPAGE(cur_level - 1); > > > fault->pfn |= fault->gfn & page_mask; > > > fault->goal_level--; > > > } > > > -- > > > 2.35.1.894.gb6a874cedc-goog > > >
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 5628d0ba637e..4d358c273f6c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ cur_level == fault->goal_level && is_shadow_present_pte(spte) && !is_large_pte(spte)) { + struct kvm_mmu_page *sp; + u64 page_mask; + /* + * When nx hugepage flag is not set, there is no reason to + * go down to another level. This helps demand paging to + * generate large mappings. + */ + sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK); + if (!sp->lpage_disallowed) + return; /* * A small SPTE exists for this pfn, but FNAME(fetch) * and __direct_map would like to create a large PTE @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ * patching back for them into pfn the next 9 bits of * the address. */ - u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) - - KVM_PAGES_PER_HPAGE(cur_level - 1); + page_mask = KVM_PAGES_PER_HPAGE(cur_level) - + KVM_PAGES_PER_HPAGE(cur_level - 1); fault->pfn |= fault->gfn & page_mask; fault->goal_level--; }