Message ID | 20210611235701.3941724-1-dmatlack@google.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: x86/mmu: Fast page fault support for the TDP MMU | expand |
On 12/06/21 01:56, David Matlack wrote: > This patch series adds support for the TDP MMU in the fast_page_fault > path, which enables certain write-protection and access tracking faults > to be handled without taking the KVM MMU lock. This series brings the > performance of these faults up to par with the legacy MMU. Hi David, I have one very basic question: is the speedup due to lock contention, or to cacheline bouncing, or something else altogether? In other words, what do the profiles look like before vs. after these patches? Thanks, Paolo
On Mon, Jun 14, 2021 at 11:54:59AM +0200, Paolo Bonzini wrote: > On 12/06/21 01:56, David Matlack wrote: > > This patch series adds support for the TDP MMU in the fast_page_fault > > path, which enables certain write-protection and access tracking faults > > to be handled without taking the KVM MMU lock. This series brings the > > performance of these faults up to par with the legacy MMU. > > Hi David, > > I have one very basic question: is the speedup due to lock contention, or to > cacheline bouncing, or something else altogether? In other words, what do > the profiles look like before vs. after these patches? The speed up comes from a combination of: - Less time spent in kvm_vcpu_gfn_to_memslot. - Less lock contention on the MMU lock in read mode. Before: Overhead Symbol - 45.59% [k] kvm_vcpu_gfn_to_memslot - 45.57% kvm_vcpu_gfn_to_memslot - 29.25% kvm_page_track_is_active + 15.90% direct_page_fault + 13.35% mmu_need_write_protect + 9.10% kvm_mmu_hugepage_adjust + 7.20% try_async_pf + 18.16% [k] _raw_read_lock + 10.57% [k] direct_page_fault + 8.77% [k] handle_changed_spte_dirty_log + 4.65% [k] mark_page_dirty_in_slot 1.62% [.] run_test + 1.35% [k] x86_virt_spec_ctrl + 1.18% [k] try_grab_compound_head [...] After: Overhead Symbol + 26.23% [k] x86_virt_spec_ctrl + 15.93% [k] vmx_vmexit + 6.33% [k] vmx_vcpu_run + 4.31% [k] vcpu_enter_guest + 3.71% [k] tdp_iter_next + 3.47% [k] __vmx_vcpu_run + 2.92% [k] kvm_vcpu_gfn_to_memslot + 2.71% [k] vcpu_run + 2.71% [k] fast_page_fault + 2.51% [k] kvm_vcpu_mark_page_dirty (Both profiles were captured during "Iteration 2 dirty memory" of dirty_log_perf_test.) Related to the kvm_vcpu_gfn_to_memslot overhead: I actually have a set of patches from Ben I am planning to send soon that will reduce the number of redundant gfn-to-memslot lookups in the page fault path. > > Thanks, > > Paolo >
On 14/06/21 23:08, David Matlack wrote: > I actually have a set of > patches from Ben I am planning to send soon that will reduce the number of > redundant gfn-to-memslot lookups in the page fault path. That seems to be a possible 5.14 candidate, while this series is probably a bit too much for now. Paolo
On Tue, Jun 15, 2021 at 09:16:00AM +0200, Paolo Bonzini wrote: > On 14/06/21 23:08, David Matlack wrote: > > I actually have a set of > > patches from Ben I am planning to send soon that will reduce the number of > > redundant gfn-to-memslot lookups in the page fault path. > > That seems to be a possible 5.14 candidate, while this series is probably a > bit too much for now. Thanks for the feedback. I am not in a rush to get either series into 5.14 so that sounds fine with me. Here is how I am planning to proceed: 1. Send a new series with the cleanups to is_tdp_mmu_root Sean suggested in patch 1/8 [1]. 2. Send v2 of the TDP MMU Fast Page Fault series without patch 1/8. 3. Send out the memslot lookup optimization series. Does that sound reasonable to you? Do you have any reservations with taking (2) before (3)? [1] https://lore.kernel.org/kvm/YMepDK40DLkD4DSy@google.com/ > > Paolo >
On 16/06/21 21:27, David Matlack wrote: >>> I actually have a set of >>> patches from Ben I am planning to send soon that will reduce the number of >>> redundant gfn-to-memslot lookups in the page fault path. >> That seems to be a possible 5.14 candidate, while this series is probably a >> bit too much for now. > Thanks for the feedback. I am not in a rush to get either series into > 5.14 so that sounds fine with me. Here is how I am planning to proceed: > > 1. Send a new series with the cleanups to is_tdp_mmu_root Sean suggested > in patch 1/8 [1]. > 2. Send v2 of the TDP MMU Fast Page Fault series without patch 1/8. > 3. Send out the memslot lookup optimization series. > > Does that sound reasonable to you? Do you have any reservations with > taking (2) before (3)? > > [1]https://lore.kernel.org/kvm/YMepDK40DLkD4DSy@google.com/ They all seem reasonably independent, so use the order that is easier for you. Paolo