Message ID | 1553350688-39627-1-git-send-email-like.xu@linux.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Intel Virtual PMU Optimization | expand |
On Sat, Mar 23, 2019 at 10:18:03PM +0800, Like Xu wrote: > === Brief description === > > This proposal for Intel vPMU is still committed to optimize the basic > functionality by reducing the PMU virtualization overhead and not a blind > pass-through of the PMU. The proposal applies to existing models, in short, > is "host perf would hand over control to kvm after counter allocation". > > The pmc_reprogram_counter is a heavyweight and high frequency operation > which goes through the host perf software stack to create a perf event for > counter assignment, this could take millions of nanoseconds. The current > vPMU always does reprogram_counter when the guest changes the eventsel, > fixctrl, and global_ctrl msrs. This brings too much overhead to the usage > of perf inside the guest, especially the guest PMI handling and context > switching of guest threads with perf in use. I think I asked for starting with making pmc_reprogram_counter() less retarded. I'm not seeing that here. > We optimize the current vPMU to work in this manner: > > (1) rely on the existing host perf (perf_event_create_kernel_counter) > to allocate counters for in-use vPMC and always try to reuse events; > (2) vPMU captures guest accesses to the eventsel and fixctrl msr directly > to the hardware msr that the corresponding host event is scheduled on > and avoid pollution from host is also needed in its partial runtime; If you do pass-through; how do you deal with event constraints? > (3) save and restore the counter state during vCPU scheduling in hooks; > (4) apply a lazy approach to release the vPMC's perf event. That is, if > the vPMC isn't used in a fixed sched slice, its event will be released. > > In the use of vPMC, the vPMU always focus on the assigned resources and > guest perf would significantly benefit from direct access to hardware and > may not care about runtime state of perf_event created by host and always > try not to pay for their maintenance. However to avoid events entering into > any unexpected state, calling pmc_read_counter in appropriate is necessary. what?! I can't follow that, and the quick look I had at the patches doesn't seem to help. I did note it is intel only and that is really sad. It also makes a mess of who programs what msr when.
> > We optimize the current vPMU to work in this manner: > > > > (1) rely on the existing host perf (perf_event_create_kernel_counter) > > to allocate counters for in-use vPMC and always try to reuse events; > > (2) vPMU captures guest accesses to the eventsel and fixctrl msr directly > > to the hardware msr that the corresponding host event is scheduled on > > and avoid pollution from host is also needed in its partial runtime; > > If you do pass-through; how do you deal with event constraints? The guest has to deal with them. It already needs to know the model number to program the right events, can as well know the constraints too. For architectural events that don't need the model number it's not a problem because they don't have constraints. -Andi
On 2019/3/24 7:15, Andi Kleen wrote: >>> We optimize the current vPMU to work in this manner: >>> >>> (1) rely on the existing host perf (perf_event_create_kernel_counter) >>> to allocate counters for in-use vPMC and always try to reuse events; >>> (2) vPMU captures guest accesses to the eventsel and fixctrl msr directly >>> to the hardware msr that the corresponding host event is scheduled on >>> and avoid pollution from host is also needed in its partial runtime; >> >> If you do pass-through; how do you deal with event constraints? > > The guest has to deal with them. It already needs to know > the model number to program the right events, can as well know > the constraints too. > > For architectural events that don't need the model number it's > not a problem because they don't have constraints. > > -Andi > I agree this version doesn't seem to keep an eye on host perf event constraints deliberately: 1. Based on my limited knowledge, assuming the model number means hwc->idx. 2. The guest event constraints would be constructed into hwc->config_base value which is pmc->eventsel and pmu->fixed_ctr_ctrl from KVM point of view. 3. The guest PMU has same semantic model on virt hardware limitation as the host does with real PMU (related CPUID/PERF_MSR expose this part of information to guest). 3. Guest perf scheduler would make sure the guest event constraints could dance with right guest model number. 4. vPMU would make sure the guest vPMC get the right guest model number by hard-code EVENT_PINNED or just fail with creation. 5. This patch directly apply the guest hwc->config_base value to host assigned hardware without consent from host perf(a bit deceptive but practical for reducing the number of reprogram calls). === OR ==== If we insist on passing guest event constraints to host perf, this proposal may need the following changes: Because the guest configuration of hwc->config_base mostly only toggles the enable bit of eventsel or fixctrl,it is not necessary to do reprogram_counter because it's serving the same guest perf event. The event creation is only needed when guest writes a complete new value to eventsel or fixctrl.Codes for guest MSR_P6_EVNTSEL0 trap for example may be modified to be like this: u64 diff = pmc->eventsel ^ data; if (intel_pmc_is_assigned(pmc) && diff != ARCH_PERFMON_EVENTSEL_ENABLE) { intel_pmu_save_guest_pmc(pmu, pmc->idx); intel_pmc_stop_counter(pmc); } reprogram_gp_counter(pmc, data); Does this seem to satisfy our needs? It makes everything easier to correct me if I'm wrong.
On 2019/3/24 1:28, Peter Zijlstra wrote: > On Sat, Mar 23, 2019 at 10:18:03PM +0800, Like Xu wrote: >> === Brief description === >> >> This proposal for Intel vPMU is still committed to optimize the basic >> functionality by reducing the PMU virtualization overhead and not a blind >> pass-through of the PMU. The proposal applies to existing models, in short, >> is "host perf would hand over control to kvm after counter allocation". >> >> The pmc_reprogram_counter is a heavyweight and high frequency operation >> which goes through the host perf software stack to create a perf event for >> counter assignment, this could take millions of nanoseconds. The current >> vPMU always does reprogram_counter when the guest changes the eventsel, >> fixctrl, and global_ctrl msrs. This brings too much overhead to the usage >> of perf inside the guest, especially the guest PMI handling and context >> switching of guest threads with perf in use. > > I think I asked for starting with making pmc_reprogram_counter() less > retarded. I'm not seeing that here. Do you mean pass perf_event_attr to refactor pmc_reprogram_counter via paravirt ? Please share more details. > >> We optimize the current vPMU to work in this manner: >> >> (1) rely on the existing host perf (perf_event_create_kernel_counter) >> to allocate counters for in-use vPMC and always try to reuse events; >> (2) vPMU captures guest accesses to the eventsel and fixctrl msr directly >> to the hardware msr that the corresponding host event is scheduled on >> and avoid pollution from host is also needed in its partial runtime; > > If you do pass-through; how do you deal with event constraints > >> (3) save and restore the counter state during vCPU scheduling in hooks; >> (4) apply a lazy approach to release the vPMC's perf event. That is, if >> the vPMC isn't used in a fixed sched slice, its event will be released. >> >> In the use of vPMC, the vPMU always focus on the assigned resources and >> guest perf would significantly benefit from direct access to hardware and >> may not care about runtime state of perf_event created by host and always >> try not to pay for their maintenance. However to avoid events entering into >> any unexpected state, calling pmc_read_counter in appropriate is necessary. > > what?! The patch will reuse the created events as much as possible for same guest vPMC which may has different config_base in its partial runtime. The pmc_read_counter is designed to be called in kvm_pmu_rdpmc and pmc_stop_counter as legacy does and it's not for vPMU functionality but for host perf maintenance (seems to be gone in code,Oops). > > I can't follow that, and the quick look I had at the patches doesn't > seem to help. I did note it is intel only and that is really sad. The basic idea of optimization is x86 generic, and the implementation is not intentional cause I could not access non-Intel machines and verified it. > > It also makes a mess of who programs what msr when. > who programs: vPMU does as usual in pmc_reprogram_counter what msr: host perf scheduler make decisions and I'm not sure the hosy perf would do cross-mapping scheduling which means to assign a host fixed counter to guest gp counter and vice versa. when programs: every time to call reprogram_gp/fixed_counter && pmc_is_assigned(pmc) is false; check the fifth pacth for details.
On Mon, Mar 25, 2019 at 02:47:32PM +0800, Like Xu wrote: > On 2019/3/24 1:28, Peter Zijlstra wrote: > > On Sat, Mar 23, 2019 at 10:18:03PM +0800, Like Xu wrote: > > > === Brief description === > > > > > > This proposal for Intel vPMU is still committed to optimize the basic > > > functionality by reducing the PMU virtualization overhead and not a blind > > > pass-through of the PMU. The proposal applies to existing models, in short, > > > is "host perf would hand over control to kvm after counter allocation". > > > > > > The pmc_reprogram_counter is a heavyweight and high frequency operation > > > which goes through the host perf software stack to create a perf event for > > > counter assignment, this could take millions of nanoseconds. The current > > > vPMU always does reprogram_counter when the guest changes the eventsel, > > > fixctrl, and global_ctrl msrs. This brings too much overhead to the usage > > > of perf inside the guest, especially the guest PMI handling and context > > > switching of guest threads with perf in use. > > > > I think I asked for starting with making pmc_reprogram_counter() less > > retarded. I'm not seeing that here. > > Do you mean pass perf_event_attr to refactor pmc_reprogram_counter > via paravirt ? Please share more details. I mean nothing; I'm trying to understand wth you're doing. > > > We optimize the current vPMU to work in this manner: > > > > > > (1) rely on the existing host perf (perf_event_create_kernel_counter) > > > to allocate counters for in-use vPMC and always try to reuse events; > > > (2) vPMU captures guest accesses to the eventsel and fixctrl msr directly > > > to the hardware msr that the corresponding host event is scheduled on > > > and avoid pollution from host is also needed in its partial runtime; > > > > If you do pass-through; how do you deal with event constraints > > > > (3) save and restore the counter state during vCPU scheduling in hooks; > > > (4) apply a lazy approach to release the vPMC's perf event. That is, if > > > the vPMC isn't used in a fixed sched slice, its event will be released. > > > > > > In the use of vPMC, the vPMU always focus on the assigned resources and > > > guest perf would significantly benefit from direct access to hardware and > > > may not care about runtime state of perf_event created by host and always > > > try not to pay for their maintenance. However to avoid events entering into > > > any unexpected state, calling pmc_read_counter in appropriate is necessary. > > > > what?! > > The patch will reuse the created events as much as possible for same guest > vPMC which may has different config_base in its partial runtime. again. what?! > The pmc_read_counter is designed to be called in kvm_pmu_rdpmc and > pmc_stop_counter as legacy does and it's not for vPMU functionality but for > host perf maintenance (seems to be gone in code,Oops). > > > > > I can't follow that, and the quick look I had at the patches doesn't > > seem to help. I did note it is intel only and that is really sad. > > The basic idea of optimization is x86 generic, and the implementation is not > intentional cause I could not access non-Intel machines and verified it. > > > > > It also makes a mess of who programs what msr when. > > > > who programs: vPMU does as usual in pmc_reprogram_counter > > what msr: host perf scheduler make decisions and I'm not sure the hosy perf > would do cross-mapping scheduling which means to assign a host fixed counter > to guest gp counter and vice versa. > > when programs: every time to call reprogram_gp/fixed_counter && > pmc_is_assigned(pmc) is false; check the fifth pacth for details. I'm not going to reverse engineer this; if you can't write coherent descriptions, this isn't going anywhere. It isn't going anywhere anyway, its insane. You let perf do all its normal things and then discard the results by avoiding the wrmsr. Then you fudge a second wrmsr path somewhere. Please, just make the existing event dtrt.
> It isn't going anywhere anyway, its insane. You let perf do all its > normal things and then discard the results by avoiding the wrmsr. > > Then you fudge a second wrmsr path somewhere. > > Please, just make the existing event dtrt. I still think the right way is to force an event to a counter from an internal field. And then set that field from KVM. This is quite straight forward to do in the event scheduler. I did it for some experimential PEBS virtualization patches which require the same because they have to expose the counter indexes inside the PEBS record to the guest. -Andi
On 03/25/2019 03:19 PM, Peter Zijlstra wrote: > On Mon, Mar 25, 2019 at 02:47:32PM +0800, Like Xu wrote: >> On 2019/3/24 1:28, Peter Zijlstra wrote: >>> On Sat, Mar 23, 2019 at 10:18:03PM +0800, Like Xu wrote: >>>> === Brief description === >>>> >>>> This proposal for Intel vPMU is still committed to optimize the basic >>>> functionality by reducing the PMU virtualization overhead and not a blind >>>> pass-through of the PMU. The proposal applies to existing models, in short, >>>> is "host perf would hand over control to kvm after counter allocation". >>>> >>>> The pmc_reprogram_counter is a heavyweight and high frequency operation >>>> which goes through the host perf software stack to create a perf event for >>>> counter assignment, this could take millions of nanoseconds. The current >>>> vPMU always does reprogram_counter when the guest changes the eventsel, >>>> fixctrl, and global_ctrl msrs. This brings too much overhead to the usage >>>> of perf inside the guest, especially the guest PMI handling and context >>>> switching of guest threads with perf in use. >>> I think I asked for starting with making pmc_reprogram_counter() less >>> retarded. I'm not seeing that here. >> Do you mean pass perf_event_attr to refactor pmc_reprogram_counter >> via paravirt ? Please share more details. > I mean nothing; I'm trying to understand wth you're doing. I also feel the description looks confusing (sorry for being late to join in due to leaves). Also the code needs to be improved a lot. Please see the basic idea here: reprogram_counter is a heavyweight operation which goes through the perf software stack to create a perf event, this could take millions of nanoseconds. The current KVM vPMU always does reprogram_counter when the guest changes the eventsel, fixctrl, and global_ctrl msrs. This brings too much overhead to the usage of perf inside the guest, especially the guest PMI handling and context switching of guest threads with perf in use. In fact, during the guest perf event life cycle, it mostly only toggles the enable bit of eventsel or fixctrl. From the KVM point of view, if the guest only toggles the enable bits, it is not necessary to do reprogram_counter, because it is serving the same guest perf event. So the "enable bit" can be directly applied to the hardware msr that the corresponding host event is occupying. We optimize the current vPMU to work in this manner: 1) rely on the existing host perf (perf_event_create_kernel_counter) to create a perf event for each vPMC. This creation is only needed when guest writes a complete new value to eventsel or fixctrl. 2) vPMU captures guest accesses to the eventsel and fixctrl msrs. If the guest only toggles the enable bit, then we don't need to reprogram_pmc_counter, as the vPMC is serving the same guest event. So KVM only updates the enable bit directly to the hardware msr that the corresponding host event is scheduled on. 3) When the host perf reschedules perf counters and happens to have the vPMC's perf event scheduled out, KVM will do reprogram_counter. 4) We use a lazy approach to release the vPMC's perf event. That is, if the vPMC wasn't used for a vCPU time slice, the corresponding perf event will be released via kvm calling perf_event_release_kernel. Regarding who updates the underlying hardware counter: The change here is when a perf event is used by the guest (i.e. exclude_host=true or using a new flag if necessary), perf doesn't update the hardware counter (e.g. a counter's event_base and config_base), instead, the hypervisor helps to update them. Hope the above has made it clear to understand. Thanks! Best, Wei