Message ID | 1507214117-2899-1-git-send-email-wanpeng.li@hotmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
2017-10-05 07:35-0700, Wanpeng Li: > From: Wanpeng Li <wanpeng.li@hotmail.com> > > The description in the Intel SDM of how the divide configuration > register is used: "The APIC timer frequency will be the processor's bus > clock or core crystal clock frequency divided by the value specified in > the divide configuration register." > > Observation of baremetal shown that when the TDCR is change, the TMCCT > does not change or make a big jump in value, but the rate at which it > count down change. > > The patch update the emulation to APIC timer to so that a change to the > divide configuration would be reflected in the value of the counter and > when the next interrupt is triggered. > > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> > --- > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > @@ -1474,9 +1474,24 @@ static bool set_target_expiration(struct kvm_lapic *apic) > ktime_to_ns(ktime_add_ns(now, > apic->lapic_timer.period))); > > + delta = apic->lapic_timer.period; > + if (apic->divide_count != old_divisor) { Hm, nothing should happen if the guest writes the same value TDCR, but we'll reset the timer. (An extra argument would solve it, but maybe it would be nicer to add a new function for updating the expiration.) > + remaining = ktime_sub(apic->lapic_timer.target_expiration, now); > + if (ktime_to_ns(remaining) < 0) > + remaining = 0; > + delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period); > + > + if (!delta) > + return false; > + > + apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT) > + * APIC_BUS_CYCLE_NS * apic->divide_count; I'd prefer to apply the rate limiting (done earlier in this function) to the period. This version allows the guest to configure 128 times more frequent interrupts in the host. (And thinking about it, the version of [2/3] I proposed has similar problem when switching from one-shot to periodic, only there it is unpredictably limited by the speed of KVM.) Thanks. > + delta = delta * apic->divide_count / old_divisor; > + } > + > apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) + > - nsec_to_cycles(apic->vcpu, apic->lapic_timer.period); > - apic->lapic_timer.target_expiration = ktime_add_ns(now, apic->lapic_timer.period); > + nsec_to_cycles(apic->vcpu, delta); > + apic->lapic_timer.target_expiration = ktime_add_ns(now, delta); > > return true; > }
2017-10-06 2:14 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>: > 2017-10-05 07:35-0700, Wanpeng Li: >> From: Wanpeng Li <wanpeng.li@hotmail.com> >> >> The description in the Intel SDM of how the divide configuration >> register is used: "The APIC timer frequency will be the processor's bus >> clock or core crystal clock frequency divided by the value specified in >> the divide configuration register." >> >> Observation of baremetal shown that when the TDCR is change, the TMCCT >> does not change or make a big jump in value, but the rate at which it >> count down change. >> >> The patch update the emulation to APIC timer to so that a change to the >> divide configuration would be reflected in the value of the counter and >> when the next interrupt is triggered. >> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Cc: Radim Krčmář <rkrcmar@redhat.com> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> >> --- >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >> @@ -1474,9 +1474,24 @@ static bool set_target_expiration(struct kvm_lapic *apic) >> ktime_to_ns(ktime_add_ns(now, >> apic->lapic_timer.period))); >> >> + delta = apic->lapic_timer.period; >> + if (apic->divide_count != old_divisor) { > > Hm, nothing should happen if the guest writes the same value TDCR, but > we'll reset the timer. (An extra argument would solve it, but maybe it > would be nicer to add a new function for updating the expiration.) Agreed. > >> + remaining = ktime_sub(apic->lapic_timer.target_expiration, now); >> + if (ktime_to_ns(remaining) < 0) >> + remaining = 0; >> + delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period); >> + >> + if (!delta) >> + return false; >> + >> + apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT) >> + * APIC_BUS_CYCLE_NS * apic->divide_count; > > I'd prefer to apply the rate limiting (done earlier in this function) to > the period. This version allows the guest to configure 128 times more > frequent interrupts in the host. > (And thinking about it, the version of [2/3] I proposed has similar > problem when switching from one-shot to periodic, only there it is > unpredictably limited by the speed of KVM.) We didn't stop and restart the timer, why the rate will influence us for [2/3]? Regards, Wanpeng Li > > Thanks. > >> + delta = delta * apic->divide_count / old_divisor; >> + } >> + >> apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) + >> - nsec_to_cycles(apic->vcpu, apic->lapic_timer.period); >> - apic->lapic_timer.target_expiration = ktime_add_ns(now, apic->lapic_timer.period); >> + nsec_to_cycles(apic->vcpu, delta); >> + apic->lapic_timer.target_expiration = ktime_add_ns(now, delta); >> >> return true; >> }
2017-10-06 7:14 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>: > 2017-10-06 2:14 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>: >> 2017-10-05 07:35-0700, Wanpeng Li: >>> From: Wanpeng Li <wanpeng.li@hotmail.com> >>> >>> The description in the Intel SDM of how the divide configuration >>> register is used: "The APIC timer frequency will be the processor's bus >>> clock or core crystal clock frequency divided by the value specified in >>> the divide configuration register." >>> >>> Observation of baremetal shown that when the TDCR is change, the TMCCT >>> does not change or make a big jump in value, but the rate at which it >>> count down change. >>> >>> The patch update the emulation to APIC timer to so that a change to the >>> divide configuration would be reflected in the value of the counter and >>> when the next interrupt is triggered. >>> >>> Cc: Paolo Bonzini <pbonzini@redhat.com> >>> Cc: Radim Krčmář <rkrcmar@redhat.com> >>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> >>> --- >>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >>> @@ -1474,9 +1474,24 @@ static bool set_target_expiration(struct kvm_lapic *apic) >>> ktime_to_ns(ktime_add_ns(now, >>> apic->lapic_timer.period))); >>> >>> + delta = apic->lapic_timer.period; >>> + if (apic->divide_count != old_divisor) { >> >> Hm, nothing should happen if the guest writes the same value TDCR, but >> we'll reset the timer. (An extra argument would solve it, but maybe it >> would be nicer to add a new function for updating the expiration.) > > Agreed. > >> >>> + remaining = ktime_sub(apic->lapic_timer.target_expiration, now); >>> + if (ktime_to_ns(remaining) < 0) >>> + remaining = 0; >>> + delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period); >>> + >>> + if (!delta) >>> + return false; >>> + >>> + apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT) >>> + * APIC_BUS_CYCLE_NS * apic->divide_count; >> >> I'd prefer to apply the rate limiting (done earlier in this function) to >> the period. This version allows the guest to configure 128 times more >> frequent interrupts in the host. >> (And thinking about it, the version of [2/3] I proposed has similar >> problem when switching from one-shot to periodic, only there it is >> unpredictably limited by the speed of KVM.) > > We didn't stop and restart the timer, why the rate will influence us for [2/3]? Have already done in v6, please have a review. :) Regards, Wanpeng Li >> >> Thanks. >> >>> + delta = delta * apic->divide_count / old_divisor; >>> + } >>> + >>> apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) + >>> - nsec_to_cycles(apic->vcpu, apic->lapic_timer.period); >>> - apic->lapic_timer.target_expiration = ktime_add_ns(now, apic->lapic_timer.period); >>> + nsec_to_cycles(apic->vcpu, delta); >>> + apic->lapic_timer.target_expiration = ktime_add_ns(now, delta); >>> >>> return true; >>> }
2017-10-06 07:14+0800, Wanpeng Li: > 2017-10-06 2:14 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>: > > 2017-10-05 07:35-0700, Wanpeng Li: > >> From: Wanpeng Li <wanpeng.li@hotmail.com> > >> + remaining = ktime_sub(apic->lapic_timer.target_expiration, now); > >> + if (ktime_to_ns(remaining) < 0) > >> + remaining = 0; > >> + delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period); > >> + > >> + if (!delta) > >> + return false; > >> + > >> + apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT) > >> + * APIC_BUS_CYCLE_NS * apic->divide_count; > > > > I'd prefer to apply the rate limiting (done earlier in this function) to > > the period. This version allows the guest to configure 128 times more > > frequent interrupts in the host. > > (And thinking about it, the version of [2/3] I proposed has similar > > problem when switching from one-shot to periodic, only there it is > > unpredictably limited by the speed of KVM.) > > We didn't stop and restart the timer, why the rate will influence us for [2/3]? It is because of the rate limiting -- the guest could setup a one-shot timer with a short expiration and switch to periodic It is mostly theoretical as the expiration would have to be long enough so that the timer doesn't fire before KVM emulates the next instruction that switches the timer to periodic mode, but shorter than rate limit. I see you handled that in v6, thanks!
2017-10-06 21:03 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>: > 2017-10-06 07:14+0800, Wanpeng Li: >> 2017-10-06 2:14 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>: >> > 2017-10-05 07:35-0700, Wanpeng Li: >> >> From: Wanpeng Li <wanpeng.li@hotmail.com> >> >> + remaining = ktime_sub(apic->lapic_timer.target_expiration, now); >> >> + if (ktime_to_ns(remaining) < 0) >> >> + remaining = 0; >> >> + delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period); >> >> + >> >> + if (!delta) >> >> + return false; >> >> + >> >> + apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT) >> >> + * APIC_BUS_CYCLE_NS * apic->divide_count; >> > >> > I'd prefer to apply the rate limiting (done earlier in this function) to >> > the period. This version allows the guest to configure 128 times more >> > frequent interrupts in the host. >> > (And thinking about it, the version of [2/3] I proposed has similar >> > problem when switching from one-shot to periodic, only there it is >> > unpredictably limited by the speed of KVM.) >> >> We didn't stop and restart the timer, why the rate will influence us for [2/3]? > > It is because of the rate limiting -- the guest could setup a one-shot > timer with a short expiration and switch to periodic Yeah, in addition, I think configure 128 means more slower interrupts instead of faster. Regards, Wanpeng Li > > It is mostly theoretical as the expiration would have to be long enough > so that the timer doesn't fire before KVM emulates the next instruction > that switches the timer to periodic mode, but shorter than rate limit. > > I see you handled that in v6, thanks!
2017-10-06 22:03+0800, Wanpeng Li: > 2017-10-06 21:03 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>: > > 2017-10-06 07:14+0800, Wanpeng Li: > >> 2017-10-06 2:14 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>: > >> > 2017-10-05 07:35-0700, Wanpeng Li: > >> >> From: Wanpeng Li <wanpeng.li@hotmail.com> > >> >> + remaining = ktime_sub(apic->lapic_timer.target_expiration, now); > >> >> + if (ktime_to_ns(remaining) < 0) > >> >> + remaining = 0; > >> >> + delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period); > >> >> + > >> >> + if (!delta) > >> >> + return false; > >> >> + > >> >> + apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT) > >> >> + * APIC_BUS_CYCLE_NS * apic->divide_count; > >> > > >> > I'd prefer to apply the rate limiting (done earlier in this function) to > >> > the period. This version allows the guest to configure 128 times more > >> > frequent interrupts in the host. > >> > (And thinking about it, the version of [2/3] I proposed has similar > >> > problem when switching from one-shot to periodic, only there it is > >> > unpredictably limited by the speed of KVM.) > >> > >> We didn't stop and restart the timer, why the rate will influence us for [2/3]? > > > > It is because of the rate limiting -- the guest could setup a one-shot > > timer with a short expiration and switch to periodic > > Yeah, in addition, I think configure 128 means more slower interrupts > instead of faster. Yes, it says how many cycles it takes to decrement APIC_TMCCT. (I only concerned about the case where rate limit was configured with divide_count=128 and then switched to 1.)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 6b366c1..36f9bc8 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1434,14 +1434,14 @@ static void start_sw_period(struct kvm_lapic *apic) HRTIMER_MODE_ABS_PINNED); } -static bool set_target_expiration(struct kvm_lapic *apic) +static bool set_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor) { - ktime_t now; - u64 tscl = rdtsc(); + ktime_t now, remaining; + u64 tscl = rdtsc(), delta; now = ktime_get(); apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT) - * APIC_BUS_CYCLE_NS * apic->divide_count; + * APIC_BUS_CYCLE_NS * old_divisor; if (!apic->lapic_timer.period) return false; @@ -1474,9 +1474,24 @@ static bool set_target_expiration(struct kvm_lapic *apic) ktime_to_ns(ktime_add_ns(now, apic->lapic_timer.period))); + delta = apic->lapic_timer.period; + if (apic->divide_count != old_divisor) { + remaining = ktime_sub(apic->lapic_timer.target_expiration, now); + if (ktime_to_ns(remaining) < 0) + remaining = 0; + delta = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period); + + if (!delta) + return false; + + apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT) + * APIC_BUS_CYCLE_NS * apic->divide_count; + delta = delta * apic->divide_count / old_divisor; + } + apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) + - nsec_to_cycles(apic->vcpu, apic->lapic_timer.period); - apic->lapic_timer.target_expiration = ktime_add_ns(now, apic->lapic_timer.period); + nsec_to_cycles(apic->vcpu, delta); + apic->lapic_timer.target_expiration = ktime_add_ns(now, delta); return true; } @@ -1613,12 +1628,12 @@ void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu) restart_apic_timer(apic); } -static void start_apic_timer(struct kvm_lapic *apic) +static void start_apic_timer(struct kvm_lapic *apic, uint32_t old_divisor) { atomic_set(&apic->lapic_timer.pending, 0); if ((apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) - && !set_target_expiration(apic)) + && !set_target_expiration(apic, old_divisor)) return; restart_apic_timer(apic); @@ -1739,16 +1754,20 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) hrtimer_cancel(&apic->lapic_timer.timer); kvm_lapic_set_reg(apic, APIC_TMICT, val); - start_apic_timer(apic); + start_apic_timer(apic, apic->divide_count); break; - case APIC_TDCR: + case APIC_TDCR: { + uint32_t current_divisor = apic->divide_count; + if (val & 4) apic_debug("KVM_WRITE:TDCR %x\n", val); kvm_lapic_set_reg(apic, APIC_TDCR, val); update_divide_count(apic); + hrtimer_cancel(&apic->lapic_timer.timer); + start_apic_timer(apic, current_divisor); break; - + } case APIC_ESR: if (apic_x2apic_mode(apic) && val != 0) { apic_debug("KVM_WRITE:ESR not zero %x\n", val); @@ -1873,7 +1892,7 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data) hrtimer_cancel(&apic->lapic_timer.timer); apic->lapic_timer.tscdeadline = data; - start_apic_timer(apic); + start_apic_timer(apic, apic->divide_count); } void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8) @@ -2239,7 +2258,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) apic_update_lvtt(apic); apic_manage_nmi_watchdog(apic, kvm_lapic_get_reg(apic, APIC_LVT0)); update_divide_count(apic); - start_apic_timer(apic); + start_apic_timer(apic, apic->divide_count); apic->irr_pending = true; apic->isr_count = vcpu->arch.apicv_active ? 1 : count_vectors(apic->regs + APIC_ISR);