Message ID | 1455736496-374-2-git-send-email-rkrcmar@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 17/02/2016 20:14, Radim Kr?má? wrote: > Discard policy uses ack_notifiers to prevent injection of PIT interrupts > before EOI from the last one. > > This patch changes the policy to always try to deliver the interrupt, > which makes a difference when its vector is in ISR. > Old implementation would drop the interrupt, but proposed one injects to > IRR, like real hardware would. This seems like what libvirt calls the "merge" policy: Merge the missed tick(s) into one tick and inject. The guest time may be delayed, depending on how the OS reacts to the merging of ticks where the merged tick is the one placed into IRR. Unlike discard, "merge" can starve the guest through an interrupt storm. Rik, I think you originally worked on the missed tick policies in Xen. Is the above correct? If so, do you recall what would be the reason to use the merge policy instead of the discard policy? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-02-18 17:13+0100, Paolo Bonzini: > On 17/02/2016 20:14, Radim Kr?má? wrote: >> Discard policy uses ack_notifiers to prevent injection of PIT interrupts >> before EOI from the last one. >> >> This patch changes the policy to always try to deliver the interrupt, >> which makes a difference when its vector is in ISR. >> Old implementation would drop the interrupt, but proposed one injects to >> IRR, like real hardware would. > > This seems like what libvirt calls the "merge" policy: Oops, I never looked beyond QEMU after seeing that the naming in libvirt doesn't even match ... I think the policy that KVM implements (which I call discard) is "delay" in libvirt. (https://libvirt.org/formatdomain.html#elementsTime) > > Merge the missed tick(s) into one tick and inject. The guest time > may be delayed, depending on how the OS reacts to the merging of > ticks The "may be delayed" there makes me feel like the timer has to support a guest visible counter of missed ticks. PIT will definitely be delayed if we get another tick while the previous one is still in IRR and there is nothing that the guest can do with it. "catchup" is the other KVM policy and "discard" also needs to allow the guest to handle lost ticks. > where the merged tick is the one placed into IRR. Unlike discard, > "merge" can starve the guest through an interrupt storm. Yeah, starving a VCPU with an interrupt storm is more likely with the changed policy. It's a pretty sad situation if all the time that VCPU gets isn't even enough to run a PIT handler, so I didn't care. The NMI watchdog bug can also be solved without changing the policy. (It's a hack in any case.) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18/02/2016 17:56, Radim Kr?má? wrote: > 2016-02-18 17:13+0100, Paolo Bonzini: > > On 17/02/2016 20:14, Radim Kr?má? wrote: > > > Discard policy uses ack_notifiers to prevent injection of PIT interrupts > > > before EOI from the last one. > > > > > > This patch changes the policy to always try to deliver the interrupt, > > > which makes a difference when its vector is in ISR. > > > Old implementation would drop the interrupt, but proposed one injects to > > > IRR, like real hardware would. > > > > This seems like what libvirt calls the "merge" policy: > > Oops, I never looked beyond QEMU after seeing that the naming in libvirt > doesn't even match ... > > I think the policy that KVM implements (which I call discard) is "delay" > in libvirt. (https://libvirt.org/formatdomain.html#elementsTime) Suppose the scheduled ticks are at times 0, 20, 40, 60, 80. The EOI for time 0 is only delivered at time 42, other EOIs are timely. The resulting injections are: - for discard: 0, 60, 80. - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80. - for merge: 0, 20 (in IRR, delivered at 42), 60, 80. For delay I *think* it would be 0, 42, 62, 82, 102. You know the i8254 code better than I do. Does this make sense to you? (Or in other words, does the code *really* do the above?...) > The "may be delayed" there makes me feel like the timer has to support a > guest visible counter of missed ticks. Yes, it depends whether the guest uses PIT to count time, or just to do periodic stuff (and then it reads the time from e.g. the PMTimer). In either case, only catchup ensures that time is not delayed, and it's used for Windows which uses the RTC periodic clock to count time. >> > where the merged tick is the one placed into IRR. Unlike discard, >> > "merge" can starve the guest through an interrupt storm. > Yeah, starving a VCPU with an interrupt storm is more likely with the > changed policy. It's a pretty sad situation if all the time that VCPU > gets isn't even enough to run a PIT handler, so I didn't care. True. On one hand the hardware policy is clearly merge, not discard. The i8259 has an IRR! On the other hand I'm a bit wary of changing the policy without seeing exactly what the old OSes were doing in the PIT handler. > The NMI watchdog bug can also be solved without changing the policy. > (It's a hack in any case.) Can you send a patch for that? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18/02/2016 18:33, Paolo Bonzini wrote: > On 18/02/2016 17:56, Radim Kr?má? wrote: >> 2016-02-18 17:13+0100, Paolo Bonzini: >>> On 17/02/2016 20:14, Radim Kr?má? wrote: >>>> Discard policy uses ack_notifiers to prevent injection of PIT interrupts >>>> before EOI from the last one. >>>> >>>> This patch changes the policy to always try to deliver the interrupt, >>>> which makes a difference when its vector is in ISR. >>>> Old implementation would drop the interrupt, but proposed one injects to >>>> IRR, like real hardware would. >>> >>> This seems like what libvirt calls the "merge" policy: >> >> Oops, I never looked beyond QEMU after seeing that the naming in libvirt >> doesn't even match ... >> >> I think the policy that KVM implements (which I call discard) is "delay" >> in libvirt. (https://libvirt.org/formatdomain.html#elementsTime) > > Suppose the scheduled ticks are at times 0, 20, 40, 60, 80. The EOI for > time 0 is only delivered at time 42, other EOIs are timely. > > The resulting injections are: > > - for discard: 0, 60, 80. > > - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80. > > - for merge: 0, 20 (in IRR, delivered at 42), 60, 80. > > For delay I *think* it would be 0, 42, 62, 82, 102. Wrong: for delay it is something like 0, 42, 43, 60, 80. Your patch does the right thing, QEMU is wrong in calling the policy "discard" where it should have been "merge". In fact both i8254 and RTC use the same wrong nomenclature. And it is indeed superfluous to use ack notifiers to implement the default policy, as the default policy is already baked into the i8259. Sorry, it shows that I'm swamped as I'm messing up things a bit lately. At least I have you to correct me. :) Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17/02/2016 20:14, Radim Kr?má? wrote: > > Even though there is a chance of regressions, I think we can fix the > LVT0 NMI bug without introducing a new tick policy. I agree, but please change the KVM_CAP_REINJECT_CONTROL check to return 2. This way we can make QEMU reject old kernels when the user provides "-global kvm-pit.lost_tick_policy=merge". Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Cc'd Peter, the last guy that touched timers in libvirt, because he might know what tick policies are supposed to be.] 2016-02-18 18:55+0100, Paolo Bonzini: > On 18/02/2016 18:33, Paolo Bonzini wrote: >> On 18/02/2016 17:56, Radim Kr?má? wrote: >>> 2016-02-18 17:13+0100, Paolo Bonzini: >>>> On 17/02/2016 20:14, Radim Kr?má? wrote: >>>>> Discard policy uses ack_notifiers to prevent injection of PIT interrupts >>>>> before EOI from the last one. >>>>> >>>>> This patch changes the policy to always try to deliver the interrupt, >>>>> which makes a difference when its vector is in ISR. >>>>> Old implementation would drop the interrupt, but proposed one injects to >>>>> IRR, like real hardware would. >>>> >>>> This seems like what libvirt calls the "merge" policy: >>> >>> Oops, I never looked beyond QEMU after seeing that the naming in libvirt >>> doesn't even match ... >>> >>> I think the policy that KVM implements (which I call discard) is "delay" >>> in libvirt. (https://libvirt.org/formatdomain.html#elementsTime) (I looked at libvirt code, but couldn't find any use of merge or discard policies, so please bear with me as I disagree wherever it's possible.) >> Suppose the scheduled ticks are at times 0, 20, 40, 60, 80. The EOI for >> time 0 is only delivered at time 42, other EOIs are timely. >> >> The resulting injections are: >> - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80. >> >> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80. >> >> For delay I *think* it would be 0, 42, 62, 82, 102. I could call this "delay". Continue to deliver ticks at the normal rate. The guest time will be delayed due to the late tick At 82 time units, the guest thinks it's 60, so the guest will do everything late. (Leading us to call it delayed?!) Few examples of "delay" that I find easier to accept: 0, 60, 80. 0, 42, 60, 80. Because we haven't missed the tick at 20, it just took a while to be delivered. (Semantics ...) > Wrong: for delay it is something like 0, 42, 43, 60, 80. Aargh! One KVM policy does this and QEMU calls it 'delay'. I think that libvirt would call it "catchup". Deliver ticks at a higher rate to catch up with the missed tick. The guest time should not be delayed once catchup is complete. At 80, the guest time is 80; no signs of delay. > Your patch does the right thing, QEMU is wrong in calling the policy > "discard" where it should have been "merge". In fact both i8254 and RTC > use the same wrong nomenclature. Terminlogy does suck. (Maybe it stems from the fact that QEMU talks about lost ticks, but libvirt about ticks?) Nevertheless, I don't think that libvirt "merge" covers what PIT does in KVM or real hardware. Merge the missed tick(s) into one tick and inject. The guest time may be delayed, depending on how the OS reacts to the merging of ticks No merging is happening in KVM or real hardware: every tick is exactly one tick, so the guest cannot tell that we missed some ticks and the time is delayed. If a tick made it into clear IRR, it's not missed. In the example: >> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80. at 80, the guest thinks it's 60. I think that merge might do: 0, 42, 60, 80. But the tick at 42 is counted as two ticks (20, 40) in the guest. The main problem of this interpretation is that discard is a subset of merge: >> - for discard: 0, 60, 80. The tick at 60 has to be counted as three ticks (20, 40, 60). *throws hands into the air and runs in circles* -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Feb 19, 2016 at 15:44:23 +0100, Radim Kr?má? wrote: > [Cc'd Peter, the last guy that touched timers in libvirt, because he > might know what tick policies are supposed to be.] I found the following RFC that describes the design of timer access in libvirt: http://www.redhat.com/archives/libvir-list/2010-March/msg00304.html > > 2016-02-18 18:55+0100, Paolo Bonzini: > > On 18/02/2016 18:33, Paolo Bonzini wrote: > >> On 18/02/2016 17:56, Radim Kr?má? wrote: > >>> 2016-02-18 17:13+0100, Paolo Bonzini: > >>>> On 17/02/2016 20:14, Radim Kr?má? wrote: > >>>>> Discard policy uses ack_notifiers to prevent injection of PIT interrupts > >>>>> before EOI from the last one. > >>>>> > >>>>> This patch changes the policy to always try to deliver the interrupt, > >>>>> which makes a difference when its vector is in ISR. > >>>>> Old implementation would drop the interrupt, but proposed one injects to > >>>>> IRR, like real hardware would. > >>>> > >>>> This seems like what libvirt calls the "merge" policy: > >>> > >>> Oops, I never looked beyond QEMU after seeing that the naming in libvirt > >>> doesn't even match ... > >>> > >>> I think the policy that KVM implements (which I call discard) is "delay" > >>> in libvirt. (https://libvirt.org/formatdomain.html#elementsTime) > > (I looked at libvirt code, but couldn't find any use of merge or discard > policies, so please bear with me as I disagree wherever it's possible.) Indeed, it looks like it never was implemented. Unfortunately i'm not able to assist more in this case. Peter
On 19/02/2016 15:44, Radim Kr?má? wrote: >> The resulting injections are: >>> - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80. I think we agree that 0, 42, 43, 60, 80 is also a "catchup"/"slew" policy. So we can change QEMU's kvm-i8254 to accept "slew" and warn if "delay" is given. In fact "slew" means "a large number or quantity of something" and indeed that's a good word to characterize kvm-i8254's reinjection behavior. >>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80. >>> >>> For delay I *think* it would be 0, 42, 62, 82, 102. > > I could call this "delay". > > Continue to deliver ticks at the normal rate. The guest time will be > delayed due to the late tick > > At 82 time units, the guest thinks it's 60, so the guest will do > everything late. (Leading us to call it delayed?!) Yup, this was my reasoning. > Few examples of "delay" that I find easier to accept: > 0, 60, 80. This is "discard". > 0, 42, 60, 80. Because we haven't missed the tick at 20, it just took > a while to be delivered. (Semantics ...) This is not discard. The ideal implementation of the tick policies is that the timer devices enjoy information from the interrupt controller, that lets them know when a tick cannot be delivered. In that case they do stuff like: - save it for later (catchup) - drop it for good (which is discard, and not the same as stashing it in IRR) - pause the timer (delay) - coalesce the ticks into one late tick (merge) > Terminlogy does suck. (Maybe it stems from the fact that QEMU talks > about lost ticks, but libvirt about ticks?) > Nevertheless, I don't think that libvirt "merge" covers what PIT does in > KVM or real hardware. > > Merge the missed tick(s) into one tick and inject. The guest time may > be delayed, depending on how the OS reacts to the merging of ticks > > No merging is happening in KVM or real hardware: every tick is exactly > one tick, so the guest cannot tell that we missed some ticks and the > time is delayed. If a tick made it into clear IRR, it's not missed. The libvirt documentation is written from the point of view of the guest, ignoring whether the late tick is recorded in some guest-visible register (IRR) or in the processor state (as is the case for NMI) or in the timer device state or who knows where. Therefore, it _also_ happens that thanks to IRR and NMI latching you can implement "merge" without having that kind of relationship between the timer device and the interrupt controller. But libvirt doesn't care, all the <timer> element wants is the guest-visible behavior in terms of when the ticks are delivered. This was my reasoning when proposing to change QEMU regarding "discard" as well: - RTC would warn for "discard" and accept "merge" - kvm-i8254 would warn for "discard", and accept "merge" if the capability says that your patch is in. The idea is that "discard" is such a bad idea, that "merge" should fail if the hypervisor would cause the watchdog to hang. > In the example: > >>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80. > > at 80, the guest thinks it's 60. > > I think that merge might do: 0, 42, 60, 80. > But the tick at 42 is counted as two ticks (20, 40) in the guest. Yes, merge is a good policy if the guest can somehow realize that 42 stood for (20, 40). Discard would be a good policy too if the guest can somehow realize that 60 stood for (20, 40, 60) but it has the problem that NMIs don't do EOIs. > The main problem of this interpretation is that discard is a subset of > merge: > >>> - for discard: 0, 60, 80. > > The tick at 60 has to be counted as three ticks (20, 40, 60). Why is it a problem? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I think I understand your definitions of tick policies and I remain convinced that libvirt defines them differently. Would you prefer that I got a confirmation on the libvirt list before continuing the discussion here? Two relevant parts of my reply that don't depend on libvirt's reply are marked with ** . 2016-02-25 14:38+0100, Paolo Bonzini: > On 19/02/2016 15:44, Radim Kr?má? wrote: >>> The resulting injections are: >>>> - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80. > > I think we agree that 0, 42, 43, 60, 80 is also a "catchup"/"slew" > policy. We do. (Libvirt "catchup" is also QEMU "delay".) > So we can change QEMU's kvm-i8254 to accept "slew" and warn if > "delay" is given. ** QEMU 4e4fa398db69 ("qdev: Introduce lost tick policy property") defines: delay - replay all lost ticks in a row once the guest accepts them again slew - lost ticks are gradually replayed at a higher frequency than the original tick "delay" is exactly how kvm-i8254 behaves (in its "reinject" mode), so I think we shouldn't change it. > In fact "slew" means "a large number or quantity of something" and > indeed that's a good word to characterize kvm-i8254's reinjection behavior. (Isn't it a verb, with a similar meaning as "drift"? ;]) >>>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80. >>>> >>>> For delay I *think* it would be 0, 42, 62, 82, 102. >> >> I could call this "delay". >> >> Continue to deliver ticks at the normal rate. The guest time will be >> delayed due to the late tick >> >> At 82 time units, the guest thinks it's 60, so the guest will do >> everything late. (Leading us to call it delayed?!) > > Yup, this was my reasoning. > >> Few examples of "delay" that I find easier to accept: >> 0, 60, 80. > > This is "discard". At 80, the guest thinks that the time is 40, so every action it does will still be delayed. I don't see why it isn't libvirt "delay": - ticks are delivered at the normal rate - guest time is delayed (The original RFC that Peter shared called this policy "none".) I don't think it is libvirt "discard": - missed ticks were thrown away - future injection continues normally which is fine, but - the guest time is delayed, because there isn't a way to handle lost ticks and this is incompatible with libvirt's definition of "discard" The guest time may be delayed, unless the OS has explicit handling of lost ticks. "may" doesn't fit. You can only say - the guest time is delayed. which is best described by "delay". >> 0, 42, 60, 80. Because we haven't missed the tick at 20, it just took >> a while to be delivered. (Semantics ...) > > This is not discard. The ideal implementation of the tick policies is > that the timer devices enjoy information from the interrupt controller, > that lets them know when a tick cannot be delivered. In that case they > do stuff like: Yes. The case where the timer doesn't know about missed ticks also deserves a tick policy and I think that libvirt calls it "delay", QEMU "discard". > - save it for later (catchup) > > - drop it for good (which is discard, and not the same as stashing it in > IRR) > > - pause the timer (delay) > > - coalesce the ticks into one late tick (merge) > >> Terminlogy does suck. (Maybe it stems from the fact that QEMU talks >> about lost ticks, but libvirt about ticks?) >> Nevertheless, I don't think that libvirt "merge" covers what PIT does in >> KVM or real hardware. >> >> Merge the missed tick(s) into one tick and inject. The guest time may >> be delayed, depending on how the OS reacts to the merging of ticks >> >> No merging is happening in KVM or real hardware: every tick is exactly >> one tick, so the guest cannot tell that we missed some ticks and the >> time is delayed. If a tick made it into clear IRR, it's not missed. > > The libvirt documentation is written from the point of view of the > guest, ignoring whether the late tick is recorded in some guest-visible > register (IRR) or in the processor state (as is the case for NMI) or in > the timer device state or who knows where. Yes. > Therefore, it _also_ happens that thanks to IRR and NMI latching you can > implement "merge" without having that kind of relationship between the > timer device and the interrupt controller. I disagree. IRR can catch at most one interrupt, so it is insufficient to implement libvirt's merge. (libvirt's merge also has the conditional "The guest time may be delayed".) > But libvirt doesn't care, > all the <timer> element wants is the guest-visible behavior in terms of > when the ticks are delivered. Yes, and the guest can't behave like the libvirt definition says. Libvirt mainly cares if guest's time was irreversibly delayed or not and for how long can the time be delayed. (3 of 4 policies in libvirt don't need to delay the time. Only 1 policy in QEMU lets the guest recover time. 2 if you say that libvirt "merge" = QEMU "merge" and keep pit as QEMU "discard".) > This was my reasoning when proposing to change QEMU regarding "discard" > as well: > > - RTC would warn for "discard" and accept "merge" > > - kvm-i8254 would warn for "discard", and accept "merge" if the > capability says that your patch is in. The idea is that "discard" is > such a bad idea, that "merge" should fail if the hypervisor would cause > the watchdog to hang. ** I think it's better to say that the tick in IRR wasn't lost and policy is "discard". discard - ignore lost ticks (e.g. if the guest compensates for them already) (We don't need to say that a tick is lost just because it couldn't be delivered within some interval.) "merge" also fits the behavior of kvm-i8254 if you treat IRR as a part of the timer policy. merge - if multiple ticks are lost, all of them are merged into one which is replayed once the guest accepts it again (It means that every tick was "lost" and then replayed, usually right after the tick was "lost" -- this is because you can be only be merging lost ticks, so if there could be a non-lost tick in IRR while no tick was being handled, e.g. disabled interrupts, and you wanted to deliver a next one, then you'd need to have another bit that remembers lost ticks.) >> In the example: >> >>>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80. >> >> at 80, the guest thinks it's 60. >> >> I think that merge might do: 0, 42, 60, 80. >> But the tick at 42 is counted as two ticks (20, 40) in the guest. > > Yes, merge is a good policy if the guest can somehow realize that 42 > stood for (20, 40). Discard would be a good policy too if the guest can > somehow realize that 60 stood for (20, 40, 60) but it has the problem > that NMIs don't do EOIs. Yes, recognizing lost ticks is a prerequisite for catchup/delay/discard. Standard i8254 & friends aren't able to implement any policy other than libvirt "delay", because there is no notion of lost ticks. (KVM's changes to EOI notifiers were very shortsighted.) >> The main problem of this interpretation is that discard is a subset of >> merge: >> >>>> - for discard: 0, 60, 80. >> >> The tick at 60 has to be counted as three ticks (20, 40, 60). > > Why is it a problem? Defining a subset without using the superset is a major flaw in a spec. It got clarified in the link that Peter gave, where merge delivers "asap" while discard just waits for the next tick. :) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> 2016-02-25 14:38+0100, Paolo Bonzini: >> On 19/02/2016 15:44, Radim Kr?má? wrote: >>>> The resulting injections are: >>>>> - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80. >> >> I think we agree that 0, 42, 43, 60, 80 is also a "catchup"/"slew" >> policy. > > We do. (Libvirt "catchup" is also QEMU "delay".) > >> So we can change QEMU's kvm-i8254 to accept "slew" and warn if >> "delay" is given. > ** > QEMU 4e4fa398db69 ("qdev: Introduce lost tick policy property") defines: > > delay - replay all lost ticks in a row once the guest accepts them > again > slew - lost ticks are gradually replayed at a higher frequency than > the original tick > > "delay" is exactly how kvm-i8254 behaves (in its "reinject" mode), so I > think we shouldn't change it. Ooh, I missed this commit message indeed. Then libvirt delay != QEMU delay, isn't it? >> In fact "slew" means "a large number or quantity of something" and >> indeed that's a good word to characterize kvm-i8254's reinjection behavior. > > (Isn't it a verb, with a similar meaning as "drift"? ;]) It's a noun too, like "I've just gotten a whole slew of bugs assigned to me". >>> Few examples of "delay" that I find easier to accept: >>> 0, 60, 80. >> >> This is "discard". > > At 80, the guest thinks that the time is 40, so every action it does > will still be delayed. I don't see why it isn't libvirt "delay": > - ticks are delivered at the normal rate > - guest time is delayed I can buy this. :) > I don't think it is libvirt "discard": > - missed ticks were thrown away > - future injection continues normally > > which is fine, but > - the guest time is delayed, because there isn't a way to handle lost > ticks > > and this is incompatible with libvirt's definition of "discard" > > The guest time may be delayed, unless the OS has explicit handling of > lost ticks. > > "may" doesn't fit. You can only say > - the guest time is delayed. > > which is best described by "delay". I think we can safely ignore the "may be" -- you cannot say for sure that the guest time "will" be delayed since you could always have a very enlightened guest. ... but then, by removing the handwavy "may be" would you say that libvirt delay and libvirt discard are the same? Would 0, 42, 62, 82 be a valid implementation of the libvirt "delay" policy? >> Therefore, it _also_ happens that thanks to IRR and NMI latching you can >> implement "merge" without having that kind of relationship between the >> timer device and the interrupt controller. > > I disagree. IRR can catch at most one interrupt, so it is insufficient > to implement libvirt's merge. (libvirt's merge also has the conditional > "The guest time may be delayed".) Hmm... is your point that the i8254 _alone_ is implementing discard, and the tick delivery time is _actually_ 0, 20, 60, 80 (and the t=20 tick is delivered late but not lost due to the i8259 buffer)? And hence the QEMU device model should see it as discard. I can definitely agree with that. There is still the matter of: - improving the documentation - clarifying the meaning of libvirt delay - deciding whether it's worth changing the meaning of QEMU delay to match libvirt's (and the default kvm-pit policy from delay to slew) But if we can agree on this, I can apply patch 1 as is, even for 4.5. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-02-25 20:11+0100, Paolo Bonzini: >> 2016-02-25 14:38+0100, Paolo Bonzini: >>> On 19/02/2016 15:44, Radim Kr?má? wrote: >>> So we can change QEMU's kvm-i8254 to accept "slew" and warn if >>> "delay" is given. >> ** >> QEMU 4e4fa398db69 ("qdev: Introduce lost tick policy property") defines: >> >> delay - replay all lost ticks in a row once the guest accepts them >> again >> slew - lost ticks are gradually replayed at a higher frequency than >> the original tick >> >> "delay" is exactly how kvm-i8254 behaves (in its "reinject" mode), so I >> think we shouldn't change it. > > Ooh, I missed this commit message indeed. Then libvirt delay != QEMU > delay, isn't it? Exactly, it's sad. libvirt delay -> QEMU discard. (Lost) tick policy relations look like this libvirt | QEMU catchup -> delay delay -> discard discard -> n/a catchup -> slew delay -> merge (?) merge -> n/a (?) Delay, discard, and merge are too ambiguous to make sense. >> [...] >> and this is incompatible with libvirt's definition of "discard" >> >> The guest time may be delayed, unless the OS has explicit handling of >> lost ticks. >> >> "may" doesn't fit. You can only say >> - the guest time is delayed. >> >> which is best described by "delay". > > I think we can safely ignore the "may be" -- you cannot say for sure > that the guest time "will" be delayed since you could always have a very > enlightened guest. > ... but then, by removing the handwavy "may be" would you say that > libvirt delay and libvirt discard are the same? I would, which is why the "may be" is significant -- the timer has to provide tools to the guest, even if the guest ignores them. Do you agree that following rephrasing is identical to libvirt discard? Lost ticks will delay the guest time, unless the guest OS has explicit handling of lost ticks. > Would 0, 42, 62, 82 be > a valid implementation of the libvirt "delay" policy? Distinguishing it from saner variants (0, 60, 80 or 0, 20/42, 60, 80) would be nice, because shifting the phase is not "continuing at normal rate" for me, but I'd frown and accept it ... The important part is that guest time never recovers from the lost tick. >>> Therefore, it _also_ happens that thanks to IRR and NMI latching you can >>> implement "merge" without having that kind of relationship between the >>> timer device and the interrupt controller. >> >> I disagree. IRR can catch at most one interrupt, so it is insufficient >> to implement libvirt's merge. (libvirt's merge also has the conditional >> "The guest time may be delayed".) > > Hmm... is your point that the i8254 _alone_ is implementing discard, and > the tick delivery time is _actually_ 0, 20, 60, 80 (and the t=20 tick is > delivered late but not lost due to the i8259 buffer)? And hence the > QEMU device model should see it as discard. I can definitely agree with > that. Yes. Only the tick at 40 is lost. > There is still the matter of: > > - improving the documentation Absolutely, this email thread shouldn't have existed. QEMU merge policy isn't defined de facto ... do we say that it falls into libvirt delay? (Or just remove it?) merge - if multiple ticks are lost, all of them are merged into one which is replayed once the guest accepts it again > - clarifying the meaning of libvirt delay Yes, I wouldn't mind excluding 0, 42, 62, 82. :) > - deciding whether it's worth changing the meaning of QEMU delay to > match libvirt's (and the default kvm-pit policy from delay to slew) Changing QEMU's lost_tick_policy seems like a recipe for confusion. It'd be nice to unify QEMU and libvirt terms, but QEMU does care about backward compatibility and I think it is not wise to do this change without a new interface. I'd rather just document and forget. :) > But if we can agree on this, I can apply patch 1 as is, even for 4.5. I think we agree on all parts that affect this series. I'll start preparing v3. (Likely posting on Tuesday/Wednesday.) Thank you. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index b0ea42b78ccd..ab5318727579 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -245,7 +245,7 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian) * PIC is being reset. Handle it gracefully here */ atomic_inc(&ps->pending); - else if (value > 0) + else if (value > 0 && ps->reinject) /* in this case, we had multiple outstanding pit interrupts * that we needed to inject. Reinject */ @@ -288,7 +288,9 @@ static void pit_do_work(struct kthread_work *work) * last one has been acked. */ spin_lock(&ps->inject_lock); - if (ps->irq_ack) { + if (!ps->reinject) + inject = 1; + else if (ps->irq_ack) { ps->irq_ack = 0; inject = 1; } @@ -317,10 +319,10 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) struct kvm_kpit_state *ps = container_of(data, struct kvm_kpit_state, timer); struct kvm_pit *pt = ps->kvm->arch.vpit; - if (ps->reinject || !atomic_read(&ps->pending)) { + if (ps->reinject) atomic_inc(&ps->pending); - queue_kthread_work(&pt->worker, &pt->expired); - } + + queue_kthread_work(&pt->worker, &pt->expired); if (ps->is_periodic) { hrtimer_add_expires_ns(&ps->timer, ps->period);
Discard policy uses ack_notifiers to prevent injection of PIT interrupts before EOI from the last one. This patch changes the policy to always try to deliver the interrupt, which makes a difference when its vector is in ISR. Old implementation would drop the interrupt, but proposed one injects to IRR, like real hardware would. The old policy breaks legacy NMI watchdogs, where PIT is used through virtual wire (LVT0): PIT never sends an interrupt before receiving EOI, thus a guest deadlock with disabled interrupts will stop NMIs. Note that NMI doesn't do EOI, so PIT also had to send a normal interrupt through IOAPIC. (KVM's PIT is deeply rotten and luckily not used much in modern systems.) Even though there is a chance of regressions, I think we can fix the LVT0 NMI bug without introducing a new tick policy. Cc: <stable@vger.kernel.org> Reported-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp> Signed-off-by: Radim Kr?má? <rkrcmar@redhat.com> --- v2: resolve dependencies to make it the first patch arch/x86/kvm/i8254.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)