Message ID | 20230801124844.278698-4-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | smaps / mm/gup: fix gup_can_follow_protnone fallout | expand |
On Tue, Aug 01, 2023 at 02:48:39PM +0200, David Hildenbrand wrote: > KVM is *the* case we know that really wants to honor NUMA hinting falls. > As we want to stop setting FOLL_HONOR_NUMA_FAULT implicitly, set > FOLL_HONOR_NUMA_FAULT whenever we might obtain pages on behalf of a VCPU > to map them into a secondary MMU, and add a comment why. > > Do that unconditionally in hva_to_pfn_slow() when calling > get_user_pages_unlocked(). > > kvmppc_book3s_instantiate_page(), hva_to_pfn_fast() and > gfn_to_page_many_atomic() are similarly used to map pages into a > secondary MMU. However, FOLL_WRITE and get_user_page_fast_only() always > implicitly honor NUMA hinting faults -- as documented for > FOLL_HONOR_NUMA_FAULT -- so we can limit this change to a single location > for now. > > Don't set it in check_user_page_hwpoison(), where we really only want to > check if the mapped page is HW-poisoned. > > We won't set it for other KVM users of get_user_pages()/pin_user_pages() > * arch/powerpc/kvm/book3s_64_mmu_hv.c: not used to map pages into a > secondary MMU. > * arch/powerpc/kvm/e500_mmu.c: only used on shared TLB pages with userspace > * arch/s390/kvm/*: s390x only supports a single NUMA node either way > * arch/x86/kvm/svm/sev.c: not used to map pages into a secondary MMU. > > This is a preparation for making FOLL_HONOR_NUMA_FAULT no longer > implicitly be set by get_user_pages() and friends. > > Signed-off-by: David Hildenbrand <david@redhat.com> Seems sane but I don't know KVM well enough to know if this is the only relevant case so didn't ack.
On 02.08.23 17:27, Mel Gorman wrote: > On Tue, Aug 01, 2023 at 02:48:39PM +0200, David Hildenbrand wrote: >> KVM is *the* case we know that really wants to honor NUMA hinting falls. >> As we want to stop setting FOLL_HONOR_NUMA_FAULT implicitly, set >> FOLL_HONOR_NUMA_FAULT whenever we might obtain pages on behalf of a VCPU >> to map them into a secondary MMU, and add a comment why. >> >> Do that unconditionally in hva_to_pfn_slow() when calling >> get_user_pages_unlocked(). >> >> kvmppc_book3s_instantiate_page(), hva_to_pfn_fast() and >> gfn_to_page_many_atomic() are similarly used to map pages into a >> secondary MMU. However, FOLL_WRITE and get_user_page_fast_only() always >> implicitly honor NUMA hinting faults -- as documented for >> FOLL_HONOR_NUMA_FAULT -- so we can limit this change to a single location >> for now. >> >> Don't set it in check_user_page_hwpoison(), where we really only want to >> check if the mapped page is HW-poisoned. >> >> We won't set it for other KVM users of get_user_pages()/pin_user_pages() >> * arch/powerpc/kvm/book3s_64_mmu_hv.c: not used to map pages into a >> secondary MMU. >> * arch/powerpc/kvm/e500_mmu.c: only used on shared TLB pages with userspace >> * arch/s390/kvm/*: s390x only supports a single NUMA node either way >> * arch/x86/kvm/svm/sev.c: not used to map pages into a secondary MMU. >> >> This is a preparation for making FOLL_HONOR_NUMA_FAULT no longer >> implicitly be set by get_user_pages() and friends. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> > > Seems sane but I don't know KVM well enough to know if this is the only > relevant case so didn't ack. Makes sense, some careful eyes from KVM people would be appreciated. At least from kvm_main.c POV, I'm pretty confident that that's it.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index dfbaafbe3a00..6e4f2b81541e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2517,7 +2517,18 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, bool interruptible, bool *writable, kvm_pfn_t *pfn) { - unsigned int flags = FOLL_HWPOISON; + /* + * When a VCPU accesses a page that is not mapped into the secondary + * MMU, we lookup the page using GUP to map it, so the guest VCPU can + * make progress. We always want to honor NUMA hinting faults in that + * case, because GUP usage corresponds to memory accesses from the VCPU. + * Otherwise, we'd not trigger NUMA hinting faults once a page is + * mapped into the secondary MMU and gets accessed by a VCPU. + * + * Note that get_user_page_fast_only() and FOLL_WRITE for now + * implicitly honor NUMA hinting faults and don't need this flag. + */ + unsigned int flags = FOLL_HWPOISON | FOLL_HONOR_NUMA_FAULT; struct page *page; int npages;
KVM is *the* case we know that really wants to honor NUMA hinting falls. As we want to stop setting FOLL_HONOR_NUMA_FAULT implicitly, set FOLL_HONOR_NUMA_FAULT whenever we might obtain pages on behalf of a VCPU to map them into a secondary MMU, and add a comment why. Do that unconditionally in hva_to_pfn_slow() when calling get_user_pages_unlocked(). kvmppc_book3s_instantiate_page(), hva_to_pfn_fast() and gfn_to_page_many_atomic() are similarly used to map pages into a secondary MMU. However, FOLL_WRITE and get_user_page_fast_only() always implicitly honor NUMA hinting faults -- as documented for FOLL_HONOR_NUMA_FAULT -- so we can limit this change to a single location for now. Don't set it in check_user_page_hwpoison(), where we really only want to check if the mapped page is HW-poisoned. We won't set it for other KVM users of get_user_pages()/pin_user_pages() * arch/powerpc/kvm/book3s_64_mmu_hv.c: not used to map pages into a secondary MMU. * arch/powerpc/kvm/e500_mmu.c: only used on shared TLB pages with userspace * arch/s390/kvm/*: s390x only supports a single NUMA node either way * arch/x86/kvm/svm/sev.c: not used to map pages into a secondary MMU. This is a preparation for making FOLL_HONOR_NUMA_FAULT no longer implicitly be set by get_user_pages() and friends. Signed-off-by: David Hildenbrand <david@redhat.com> --- virt/kvm/kvm_main.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)