Message ID | 20220723012325.1715714-1-seanjc@google.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: x86: Apply NX mitigation more precisely | expand |
On Sat, Jul 23, 2022, Sean Christopherson wrote: > Patch 6 from Mingwei is the end goal of the series. KVM incorrectly > assumes that the NX huge page mitigation is the only scenario where KVM > will create a non-leaf page instead of a huge page. Precisely track > (via kvm_mmu_page) if a non-huge page is being forced and use that info > to avoid unnecessarily forcing smaller page sizes in > disallowed_hugepage_adjust(). > > v2: Rebase, tweak a changelog accordingly. hmm, I applied this patch set (v2) on top of kvm/queue (HEAD: 1a4d88a361af) and it seems kvm-unit-tests/vmx failed on both ept=1 and ept=0. And it did not work on our internel kernel either (kernel crashed). Maybe there is still minor issues? > > v1: https://lore.kernel.org/all/20220409003847.819686-1-seanjc@google.com > > Mingwei Zhang (1): > KVM: x86/mmu: explicitly check nx_hugepage in > disallowed_hugepage_adjust() > > Sean Christopherson (5): > KVM: x86/mmu: Tag disallowed NX huge pages even if they're not tracked > KVM: x86/mmu: Properly account NX huge page workaround for nonpaging > MMUs > KVM: x86/mmu: Set disallowed_nx_huge_page in TDP MMU before setting > SPTE > KVM: x86/mmu: Track the number of TDP MMU pages, but not the actual > pages > KVM: x86/mmu: Add helper to convert SPTE value to its shadow page > > arch/x86/include/asm/kvm_host.h | 17 ++--- > arch/x86/kvm/mmu/mmu.c | 107 ++++++++++++++++++++++---------- > arch/x86/kvm/mmu/mmu_internal.h | 41 +++++++----- > arch/x86/kvm/mmu/paging_tmpl.h | 6 +- > arch/x86/kvm/mmu/spte.c | 11 ++++ > arch/x86/kvm/mmu/spte.h | 17 +++++ > arch/x86/kvm/mmu/tdp_mmu.c | 49 +++++++++------ > arch/x86/kvm/mmu/tdp_mmu.h | 2 + > 8 files changed, 167 insertions(+), 83 deletions(-) > > > base-commit: 1a4d88a361af4f2e91861d632c6a1fe87a9665c2 > -- > 2.37.1.359.gd136c6c3e2-goog >
On Tue, Jul 26, 2022, Mingwei Zhang wrote: > On Sat, Jul 23, 2022, Sean Christopherson wrote: > > Patch 6 from Mingwei is the end goal of the series. KVM incorrectly > > assumes that the NX huge page mitigation is the only scenario where KVM > > will create a non-leaf page instead of a huge page. Precisely track > > (via kvm_mmu_page) if a non-huge page is being forced and use that info > > to avoid unnecessarily forcing smaller page sizes in > > disallowed_hugepage_adjust(). > > > > v2: Rebase, tweak a changelog accordingly. > > hmm, I applied this patch set (v2) on top of kvm/queue (HEAD: > 1a4d88a361af) and it seems kvm-unit-tests/vmx failed on both ept=1 and > ept=0. And it did not work on our internel kernel either (kernel > crashed). > > Maybe there is still minor issues? Heh, or not so minor issues. I'll see what I broke. I have a bad feeling that it's the EPT tests; IIRC I only ran VMX on a platform with MAXPHYADDR < 40.
On Tue, Jul 26, 2022, Sean Christopherson wrote: > On Tue, Jul 26, 2022, Mingwei Zhang wrote: > > On Sat, Jul 23, 2022, Sean Christopherson wrote: > > > Patch 6 from Mingwei is the end goal of the series. KVM incorrectly > > > assumes that the NX huge page mitigation is the only scenario where KVM > > > will create a non-leaf page instead of a huge page. Precisely track > > > (via kvm_mmu_page) if a non-huge page is being forced and use that info > > > to avoid unnecessarily forcing smaller page sizes in > > > disallowed_hugepage_adjust(). > > > > > > v2: Rebase, tweak a changelog accordingly. > > > > hmm, I applied this patch set (v2) on top of kvm/queue (HEAD: > > 1a4d88a361af) and it seems kvm-unit-tests/vmx failed on both ept=1 and > > ept=0. And it did not work on our internel kernel either (kernel > > crashed). > > > > Maybe there is still minor issues? > > Heh, or not so minor issues. I'll see what I broke. I have a bad feeling that > it's the EPT tests; IIRC I only ran VMX on a platform with MAXPHYADDR < 40. Hrm, not seeing failures (beyond the VMX_VMCS_ENUM.MAX_INDEX failure because I'm running an older QEMU). I'll follow up off-list to figure out what's going on.
On 7/23/22 03:23, Sean Christopherson wrote: > Patch 6 from Mingwei is the end goal of the series. KVM incorrectly > assumes that the NX huge page mitigation is the only scenario where KVM > will create a non-leaf page instead of a huge page. Precisely track > (via kvm_mmu_page) if a non-huge page is being forced and use that info > to avoid unnecessarily forcing smaller page sizes in > disallowed_hugepage_adjust(). > > v2: Rebase, tweak a changelog accordingly. > > v1:https://lore.kernel.org/all/20220409003847.819686-1-seanjc@google.com > > Mingwei Zhang (1): > KVM: x86/mmu: explicitly check nx_hugepage in > disallowed_hugepage_adjust() > > Sean Christopherson (5): > KVM: x86/mmu: Tag disallowed NX huge pages even if they're not tracked > KVM: x86/mmu: Properly account NX huge page workaround for nonpaging > MMUs > KVM: x86/mmu: Set disallowed_nx_huge_page in TDP MMU before setting > SPTE > KVM: x86/mmu: Track the number of TDP MMU pages, but not the actual > pages > KVM: x86/mmu: Add helper to convert SPTE value to its shadow page Some of the benefits are cool, such as not having to track the pages for the TDP MMU, and patch 2 is a borderline bugfix, but there's quite a lot of new non-obvious complexity here. So the obligatory question is: is it worth a hundred lines of new code? Paolo
On Thu, Jul 28, 2022, Paolo Bonzini wrote: > On 7/23/22 03:23, Sean Christopherson wrote: > > Patch 6 from Mingwei is the end goal of the series. KVM incorrectly > > assumes that the NX huge page mitigation is the only scenario where KVM > > will create a non-leaf page instead of a huge page. Precisely track > > (via kvm_mmu_page) if a non-huge page is being forced and use that info > > to avoid unnecessarily forcing smaller page sizes in > > disallowed_hugepage_adjust(). > > > > v2: Rebase, tweak a changelog accordingly. > > > > v1:https://lore.kernel.org/all/20220409003847.819686-1-seanjc@google.com > > > > Mingwei Zhang (1): > > KVM: x86/mmu: explicitly check nx_hugepage in > > disallowed_hugepage_adjust() > > > > Sean Christopherson (5): > > KVM: x86/mmu: Tag disallowed NX huge pages even if they're not tracked > > KVM: x86/mmu: Properly account NX huge page workaround for nonpaging > > MMUs > > KVM: x86/mmu: Set disallowed_nx_huge_page in TDP MMU before setting > > SPTE > > KVM: x86/mmu: Track the number of TDP MMU pages, but not the actual > > pages > > KVM: x86/mmu: Add helper to convert SPTE value to its shadow page > > Some of the benefits are cool, such as not having to track the pages for the > TDP MMU, and patch 2 is a borderline bugfix, but there's quite a lot of new > non-obvious complexity here. 100% agree on the complexity. > So the obligatory question is: is it worth a hundred lines of new code? Assuming I understanding the bug Mingwei's patch fixes, yes. Though after re-reading that changelog, it should more explicitly call out the scenario we actually care about. Anyways, the bug we really care about is that by not precisely checking if a huge page is disallowed, KVM would refuse to create huge page after disabling dirty logging, which is a very noticeable performance issue for large VMs if a migration is canceled. That particular bug has since been unintentionally fixed in the TDP MMU by zapping the non-leaf SPTE, but there are other paths that could similarly be affected, e.g. I believe zapping leaf SPTEs in response to a host page migration (mmu_notifier invalidation) to create a huge page would yield a similar result; KVM would see the shadow-present non-leaf SPTE and assume a huge page is disallowed.
On Thu, Jul 28, 2022 at 2:20 PM Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Jul 28, 2022, Paolo Bonzini wrote: > > On 7/23/22 03:23, Sean Christopherson wrote: > > > Patch 6 from Mingwei is the end goal of the series. KVM incorrectly > > > assumes that the NX huge page mitigation is the only scenario where KVM > > > will create a non-leaf page instead of a huge page. Precisely track > > > (via kvm_mmu_page) if a non-huge page is being forced and use that info > > > to avoid unnecessarily forcing smaller page sizes in > > > disallowed_hugepage_adjust(). > > > > > > v2: Rebase, tweak a changelog accordingly. > > > > > > v1:https://lore.kernel.org/all/20220409003847.819686-1-seanjc@google.com > > > > > > Mingwei Zhang (1): > > > KVM: x86/mmu: explicitly check nx_hugepage in > > > disallowed_hugepage_adjust() > > > > > > Sean Christopherson (5): > > > KVM: x86/mmu: Tag disallowed NX huge pages even if they're not tracked > > > KVM: x86/mmu: Properly account NX huge page workaround for nonpaging > > > MMUs > > > KVM: x86/mmu: Set disallowed_nx_huge_page in TDP MMU before setting > > > SPTE > > > KVM: x86/mmu: Track the number of TDP MMU pages, but not the actual > > > pages > > > KVM: x86/mmu: Add helper to convert SPTE value to its shadow page > > > > Some of the benefits are cool, such as not having to track the pages for the > > TDP MMU, and patch 2 is a borderline bugfix, but there's quite a lot of new > > non-obvious complexity here. > > 100% agree on the complexity. > > > So the obligatory question is: is it worth a hundred lines of new code? > > Assuming I understanding the bug Mingwei's patch fixes, yes. Though after > re-reading that changelog, it should more explicitly call out the scenario we > actually care about. > > Anyways, the bug we really care about is that by not precisely checking if a > huge page is disallowed, KVM would refuse to create huge page after disabling > dirty logging, which is a very noticeable performance issue for large VMs if > a migration is canceled. That particular bug has since been unintentionally > fixed in the TDP MMU by zapping the non-leaf SPTE, but there are other paths > that could similarly be affected, e.g. I believe zapping leaf SPTEs in response > to a host page migration (mmu_notifier invalidation) to create a huge page would > yield a similar result; KVM would see the shadow-present non-leaf SPTE and assume > a huge page is disallowed. Just a quick update: the kernel crash has been resolved. It turns out to be a bug introduced when I rebase the patch. I see the patch set is working now.
On 7/28/22 23:20, Sean Christopherson wrote: > > Anyways, the bug we really care about is that by not precisely checking if a > huge page is disallowed, KVM would refuse to create huge page after disabling > dirty logging, which is a very noticeable performance issue for large VMs if > a migration is canceled. That particular bug has since been unintentionally > fixed in the TDP MMU by zapping the non-leaf SPTE, but there are other paths > that could similarly be affected, e.g. I believe zapping leaf SPTEs in response > to a host page migration (mmu_notifier invalidation) to create a huge page would > yield a similar result; KVM would see the shadow-present non-leaf SPTE and assume > a huge page is disallowed. Ok, thanks. So this will be 5.21 material even during the -rc phase; I have posted a couple comments for patch 1 and 2. One way to simplify the rmb/wmb logic could be to place the rmb/wmb respectively after loading iter.old_spte and in tdp_mmu_link_sp. If you like it, feel free to integrate it in v3. Paolo
On Fri, Jul 29, 2022, Paolo Bonzini wrote: > On 7/28/22 23:20, Sean Christopherson wrote: > > > > Anyways, the bug we really care about is that by not precisely checking if a > > huge page is disallowed, KVM would refuse to create huge page after disabling > > dirty logging, which is a very noticeable performance issue for large VMs if > > a migration is canceled. That particular bug has since been unintentionally > > fixed in the TDP MMU by zapping the non-leaf SPTE, but there are other paths > > that could similarly be affected, e.g. I believe zapping leaf SPTEs in response > > to a host page migration (mmu_notifier invalidation) to create a huge page would > > yield a similar result; KVM would see the shadow-present non-leaf SPTE and assume > > a huge page is disallowed. > > Ok, thanks. So this will be 5.21 material even during the -rc phase Yes, this definitely needs more time in the queue before being sent to Linus.