Message ID | 20210929042908.1313874-5-stevensd@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: allow mapping non-refcounted pages | expand |
On Wed, Sep 29, 2021, David Stevens wrote: > From: David Stevens <stevensd@chromium.org> > > Remove two warnings that require ref counts for pages to be non-zero, as > mapped pfns from follow_pfn may not have an initialized ref count. > > Signed-off-by: David Stevens <stevensd@chromium.org> > --- > arch/x86/kvm/mmu/mmu.c | 7 ------- > virt/kvm/kvm_main.c | 2 +- > 2 files changed, 1 insertion(+), 8 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 5a1adcc9cfbc..3b469df63bcf 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -617,13 +617,6 @@ static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep) > > pfn = spte_to_pfn(old_spte); > > - /* > - * KVM does not hold the refcount of the page used by > - * kvm mmu, before reclaiming the page, we should > - * unmap it from mmu first. > - */ > - WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn))); Have you actually observed false positives with this WARN? I would expect anything without a struct page to get filtered out by !kvm_is_reserved_pfn(pfn). If you have observed false positives, I would strongly prefer we find a way to keep the page_count() sanity check, it has proven very helpful in the past in finding/debugging bugs during MMU development. > - > if (is_accessed_spte(old_spte)) > kvm_set_pfn_accessed(pfn); >
On Wed, Oct 13, 2021 at 9:02 AM Sean Christopherson <seanjc@google.com> wrote: > > On Wed, Sep 29, 2021, David Stevens wrote: > > From: David Stevens <stevensd@chromium.org> > > > > Remove two warnings that require ref counts for pages to be non-zero, as > > mapped pfns from follow_pfn may not have an initialized ref count. > > > > Signed-off-by: David Stevens <stevensd@chromium.org> > > --- > > arch/x86/kvm/mmu/mmu.c | 7 ------- > > virt/kvm/kvm_main.c | 2 +- > > 2 files changed, 1 insertion(+), 8 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 5a1adcc9cfbc..3b469df63bcf 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -617,13 +617,6 @@ static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep) > > > > pfn = spte_to_pfn(old_spte); > > > > - /* > > - * KVM does not hold the refcount of the page used by > > - * kvm mmu, before reclaiming the page, we should > > - * unmap it from mmu first. > > - */ > > - WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn))); > > Have you actually observed false positives with this WARN? I would expect anything > without a struct page to get filtered out by !kvm_is_reserved_pfn(pfn). Those are the type of pfns that were responsible for CVE-2021-22543 [1]. One specific example is that amdgpu uses ttm_pool, which makes higher order, non-compound allocation. Without the head/tail metadata, only the first base page in such an allocation has non-zero page_count. [1] https://github.com/google/security-research/security/advisories/GHSA-7wq5-phmq-m584 > If you have observed false positives, I would strongly prefer we find a way to > keep the page_count() sanity check, it has proven very helpful in the past in > finding/debugging bugs during MMU development. When we see a refcount of zero, I think we can look up spte->(gfn, slot)->hva->vma and determine whether or not the zero refcount is okay, based on vm_flags. That's kind of heavy for a debug check, although at least we'd only pay the cost for unusual mappings. But it still might make sense to switch to a MMU_WARN_ON, in that case. Or we could just ignore the cost, since at least from a superficial reading and some basic tests, tdp_mmu doesn't seem to execute this code path. Thoughts? I'd lean towards MMU_WARN_ON, but I'd like to know what the maintainers' preferences are before sending an updated patch series. -David > > > - > > if (is_accessed_spte(old_spte)) > > kvm_set_pfn_accessed(pfn); > >
On Wed, Oct 13, 2021, David Stevens wrote: > On Wed, Oct 13, 2021 at 9:02 AM Sean Christopherson <seanjc@google.com> wrote: > > > > On Wed, Sep 29, 2021, David Stevens wrote: > > > From: David Stevens <stevensd@chromium.org> > > > > > > Remove two warnings that require ref counts for pages to be non-zero, as > > > mapped pfns from follow_pfn may not have an initialized ref count. > > > > > > Signed-off-by: David Stevens <stevensd@chromium.org> > > > --- > > > arch/x86/kvm/mmu/mmu.c | 7 ------- > > > virt/kvm/kvm_main.c | 2 +- > > > 2 files changed, 1 insertion(+), 8 deletions(-) > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index 5a1adcc9cfbc..3b469df63bcf 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -617,13 +617,6 @@ static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep) > > > > > > pfn = spte_to_pfn(old_spte); > > > > > > - /* > > > - * KVM does not hold the refcount of the page used by > > > - * kvm mmu, before reclaiming the page, we should > > > - * unmap it from mmu first. > > > - */ > > > - WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn))); > > > > Have you actually observed false positives with this WARN? I would expect anything > > without a struct page to get filtered out by !kvm_is_reserved_pfn(pfn). > > Those are the type of pfns that were responsible for CVE-2021-22543 > [1]. One specific example is that amdgpu uses ttm_pool, which makes > higher order, non-compound allocation. Without the head/tail metadata, > only the first base page in such an allocation has non-zero > page_count. Huh. I hadn't actually read the CVE, or obviously thought critically about the problem. :-) > [1] https://github.com/google/security-research/security/advisories/GHSA-7wq5-phmq-m584 > > > If you have observed false positives, I would strongly prefer we find a way to > > keep the page_count() sanity check, it has proven very helpful in the past in > > finding/debugging bugs during MMU development. > > When we see a refcount of zero, I think we can look up spte->(gfn, > slot)->hva->vma and determine whether or not the zero refcount is > okay, based on vm_flags. That's kind of heavy for a debug check, > although at least we'd only pay the cost for unusual mappings. But it > still might make sense to switch to a MMU_WARN_ON, in that case. Or we > could just ignore the cost, since at least from a superficial reading > and some basic tests, tdp_mmu doesn't seem to execute this code path. > > Thoughts? I'd lean towards MMU_WARN_ON, but I'd like to know what the > maintainers' preferences are before sending an updated patch series. MMU_WARN_ON is a poor choice, but only because no one turns it on. I think we've discussed turning it into a proper Kconfig (and killing off mmu_audit.c) multiple times, but no one has actually followed through. The TDP MMU indeed doesn't hit this path. So I'd say just keep this patch as is and punt the whole MMU_WARN_ON / audit cleanup to the future. I bet if we spend any time at all, we can think of a big pile of MMU sanity checks we could add to KVM, i.e. this can be but one of many checks that applies to all flavors of MMUs.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 5a1adcc9cfbc..3b469df63bcf 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -617,13 +617,6 @@ static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep) pfn = spte_to_pfn(old_spte); - /* - * KVM does not hold the refcount of the page used by - * kvm mmu, before reclaiming the page, we should - * unmap it from mmu first. - */ - WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn))); - if (is_accessed_spte(old_spte)) kvm_set_pfn_accessed(pfn); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 421146bd1360..c72ad995a359 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -168,7 +168,7 @@ bool kvm_is_zone_device_pfn(kvm_pfn_t pfn) * the device has been pinned, e.g. by get_user_pages(). WARN if the * page_count() is zero to help detect bad usage of this helper. */ - if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn)))) + if (!pfn_valid(pfn) || !page_count(pfn_to_page(pfn))) return false; return is_zone_device_page(pfn_to_page(pfn));