diff mbox

KVM: x86: Avoid busy loops over uninjectable pending APIC timers

Message ID 5144DAC3.7080401@web.de (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kiszka March 16, 2013, 8:49 p.m. UTC
From: Jan Kiszka <jan.kiszka@siemens.com>

If the guest didn't take the last APIC timer interrupt yet and generates
another one on top, e.g. via periodic mode, we do not block the VCPU
even if the guest state is halted. The reason is that
apic_has_pending_timer continues to return a non-zero value.

Fix this busy loop by taking the IRR content for the LVT vector in
apic_has_pending_timer into account.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Not a critical issue, we are looping fully interruptible, but it's ugly
to do so IMHO.

 arch/x86/kvm/lapic.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

Comments

Gleb Natapov March 17, 2013, 8:47 a.m. UTC | #1
On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> If the guest didn't take the last APIC timer interrupt yet and generates
> another one on top, e.g. via periodic mode, we do not block the VCPU
> even if the guest state is halted. The reason is that
> apic_has_pending_timer continues to return a non-zero value.
> 
> Fix this busy loop by taking the IRR content for the LVT vector in
> apic_has_pending_timer into account.
> 
Just drop coalescing tacking for lapic interrupt. After posted interrupt
will be merged __apic_accept_irq() will not longer return coalescing
information, so the code will be dead anyway.

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> 
> Not a critical issue, we are looping fully interruptible, but it's ugly
> to do so IMHO.
> 
>  arch/x86/kvm/lapic.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index a8e9369..658abf5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1473,7 +1473,9 @@ int apic_has_pending_timer(struct kvm_vcpu *vcpu)
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  
>  	if (kvm_vcpu_has_lapic(vcpu) && apic_enabled(apic) &&
> -			apic_lvt_enabled(apic, APIC_LVTT))
> +	    apic_lvt_enabled(apic, APIC_LVTT) &&
> +	    !apic_test_vector(apic_lvt_vector(apic, APIC_LVTT),
> +					      apic->regs + APIC_IRR))
>  		return atomic_read(&apic->lapic_timer.pending);
>  
>  	return 0;
> -- 
> 1.7.3.4

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka March 17, 2013, 10:45 a.m. UTC | #2
On 2013-03-17 09:47, Gleb Natapov wrote:
> On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> If the guest didn't take the last APIC timer interrupt yet and generates
>> another one on top, e.g. via periodic mode, we do not block the VCPU
>> even if the guest state is halted. The reason is that
>> apic_has_pending_timer continues to return a non-zero value.
>>
>> Fix this busy loop by taking the IRR content for the LVT vector in
>> apic_has_pending_timer into account.
>>
> Just drop coalescing tacking for lapic interrupt. After posted interrupt
> will be merged __apic_accept_irq() will not longer return coalescing
> information, so the code will be dead anyway.

That requires the RTC decoalescing series to go first to avoid a
regression, no? Then let's postpone this topic for now.

Jan
Gleb Natapov March 17, 2013, 10:47 a.m. UTC | #3
On Sun, Mar 17, 2013 at 11:45:34AM +0100, Jan Kiszka wrote:
> On 2013-03-17 09:47, Gleb Natapov wrote:
> > On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
> >> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>
> >> If the guest didn't take the last APIC timer interrupt yet and generates
> >> another one on top, e.g. via periodic mode, we do not block the VCPU
> >> even if the guest state is halted. The reason is that
> >> apic_has_pending_timer continues to return a non-zero value.
> >>
> >> Fix this busy loop by taking the IRR content for the LVT vector in
> >> apic_has_pending_timer into account.
> >>
> > Just drop coalescing tacking for lapic interrupt. After posted interrupt
> > will be merged __apic_accept_irq() will not longer return coalescing
> > information, so the code will be dead anyway.
> 
> That requires the RTC decoalescing series to go first to avoid a
> regression, no? Then let's postpone this topic for now.
> 
Yes, but decoalescing will work only for RTC :(

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti March 20, 2013, 7:30 p.m. UTC | #4
On Sun, Mar 17, 2013 at 12:47:17PM +0200, Gleb Natapov wrote:
> On Sun, Mar 17, 2013 at 11:45:34AM +0100, Jan Kiszka wrote:
> > On 2013-03-17 09:47, Gleb Natapov wrote:
> > > On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
> > >> From: Jan Kiszka <jan.kiszka@siemens.com>
> > >>
> > >> If the guest didn't take the last APIC timer interrupt yet and generates
> > >> another one on top, e.g. via periodic mode, we do not block the VCPU
> > >> even if the guest state is halted. The reason is that
> > >> apic_has_pending_timer continues to return a non-zero value.
> > >>
> > >> Fix this busy loop by taking the IRR content for the LVT vector in
> > >> apic_has_pending_timer into account.
> > >>
> > > Just drop coalescing tacking for lapic interrupt. After posted interrupt
> > > will be merged __apic_accept_irq() will not longer return coalescing
> > > information, so the code will be dead anyway.
> > 
> > That requires the RTC decoalescing series to go first to avoid a
> > regression, no? Then let's postpone this topic for now.
> > 
> Yes, but decoalescing will work only for RTC :(

Are you proposing to drop LAPIC interrupt reinjection? 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti March 20, 2013, 8:03 p.m. UTC | #5
On Wed, Mar 20, 2013 at 04:30:33PM -0300, Marcelo Tosatti wrote:
> On Sun, Mar 17, 2013 at 12:47:17PM +0200, Gleb Natapov wrote:
> > On Sun, Mar 17, 2013 at 11:45:34AM +0100, Jan Kiszka wrote:
> > > On 2013-03-17 09:47, Gleb Natapov wrote:
> > > > On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
> > > >> From: Jan Kiszka <jan.kiszka@siemens.com>
> > > >>
> > > >> If the guest didn't take the last APIC timer interrupt yet and generates
> > > >> another one on top, e.g. via periodic mode, we do not block the VCPU
> > > >> even if the guest state is halted. The reason is that
> > > >> apic_has_pending_timer continues to return a non-zero value.
> > > >>
> > > >> Fix this busy loop by taking the IRR content for the LVT vector in
> > > >> apic_has_pending_timer into account.
> > > >>
> > > > Just drop coalescing tacking for lapic interrupt. After posted interrupt
> > > > will be merged __apic_accept_irq() will not longer return coalescing
> > > > information, so the code will be dead anyway.
> > > 
> > > That requires the RTC decoalescing series to go first to avoid a
> > > regression, no? Then let's postpone this topic for now.
> > > 
> > Yes, but decoalescing will work only for RTC :(
> 
> Are you proposing to drop LAPIC interrupt reinjection? 

Since timer handling and injection is VCPU-local for LAPIC,
__apic_accept_irq can (and must) return coalesced information (cannot
drop LAPIC interrupt reinjection).


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov March 20, 2013, 9:32 p.m. UTC | #6
On Wed, Mar 20, 2013 at 05:03:19PM -0300, Marcelo Tosatti wrote:
> On Wed, Mar 20, 2013 at 04:30:33PM -0300, Marcelo Tosatti wrote:
> > On Sun, Mar 17, 2013 at 12:47:17PM +0200, Gleb Natapov wrote:
> > > On Sun, Mar 17, 2013 at 11:45:34AM +0100, Jan Kiszka wrote:
> > > > On 2013-03-17 09:47, Gleb Natapov wrote:
> > > > > On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
> > > > >> From: Jan Kiszka <jan.kiszka@siemens.com>
> > > > >>
> > > > >> If the guest didn't take the last APIC timer interrupt yet and generates
> > > > >> another one on top, e.g. via periodic mode, we do not block the VCPU
> > > > >> even if the guest state is halted. The reason is that
> > > > >> apic_has_pending_timer continues to return a non-zero value.
> > > > >>
> > > > >> Fix this busy loop by taking the IRR content for the LVT vector in
> > > > >> apic_has_pending_timer into account.
> > > > >>
> > > > > Just drop coalescing tacking for lapic interrupt. After posted interrupt
> > > > > will be merged __apic_accept_irq() will not longer return coalescing
> > > > > information, so the code will be dead anyway.
> > > > 
> > > > That requires the RTC decoalescing series to go first to avoid a
> > > > regression, no? Then let's postpone this topic for now.
> > > > 
> > > Yes, but decoalescing will work only for RTC :(
> > 
> > Are you proposing to drop LAPIC interrupt reinjection? 
> 
> Since timer handling and injection is VCPU-local for LAPIC,
> __apic_accept_irq can (and must) return coalesced information (cannot
> drop LAPIC interrupt reinjection).
> 
Why can't we drop LAPIC interrupt reinjection? Proposed posted interrupt
patches do not properly check for interrupt coalescing even for
VCPU-local injection.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti March 20, 2013, 11:19 p.m. UTC | #7
On Wed, Mar 20, 2013 at 11:32:38PM +0200, Gleb Natapov wrote:
> On Wed, Mar 20, 2013 at 05:03:19PM -0300, Marcelo Tosatti wrote:
> > On Wed, Mar 20, 2013 at 04:30:33PM -0300, Marcelo Tosatti wrote:
> > > On Sun, Mar 17, 2013 at 12:47:17PM +0200, Gleb Natapov wrote:
> > > > On Sun, Mar 17, 2013 at 11:45:34AM +0100, Jan Kiszka wrote:
> > > > > On 2013-03-17 09:47, Gleb Natapov wrote:
> > > > > > On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
> > > > > >> From: Jan Kiszka <jan.kiszka@siemens.com>
> > > > > >>
> > > > > >> If the guest didn't take the last APIC timer interrupt yet and generates
> > > > > >> another one on top, e.g. via periodic mode, we do not block the VCPU
> > > > > >> even if the guest state is halted. The reason is that
> > > > > >> apic_has_pending_timer continues to return a non-zero value.
> > > > > >>
> > > > > >> Fix this busy loop by taking the IRR content for the LVT vector in
> > > > > >> apic_has_pending_timer into account.
> > > > > >>
> > > > > > Just drop coalescing tacking for lapic interrupt. After posted interrupt
> > > > > > will be merged __apic_accept_irq() will not longer return coalescing
> > > > > > information, so the code will be dead anyway.
> > > > > 
> > > > > That requires the RTC decoalescing series to go first to avoid a
> > > > > regression, no? Then let's postpone this topic for now.
> > > > > 
> > > > Yes, but decoalescing will work only for RTC :(
> > > 
> > > Are you proposing to drop LAPIC interrupt reinjection? 
> > 
> > Since timer handling and injection is VCPU-local for LAPIC,
> > __apic_accept_irq can (and must) return coalesced information (cannot
> > drop LAPIC interrupt reinjection).
> > 
> Why can't we drop LAPIC interrupt reinjection? Proposed posted interrupt
> patches do not properly check for interrupt coalescing even for
> VCPU-local injection.
> 
> --
> 			Gleb.

Because older Linux guests depend on reinjection for proper timekeeping.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov March 21, 2013, 4:54 a.m. UTC | #8
On Wed, Mar 20, 2013 at 08:19:13PM -0300, Marcelo Tosatti wrote:
> On Wed, Mar 20, 2013 at 11:32:38PM +0200, Gleb Natapov wrote:
> > On Wed, Mar 20, 2013 at 05:03:19PM -0300, Marcelo Tosatti wrote:
> > > On Wed, Mar 20, 2013 at 04:30:33PM -0300, Marcelo Tosatti wrote:
> > > > On Sun, Mar 17, 2013 at 12:47:17PM +0200, Gleb Natapov wrote:
> > > > > On Sun, Mar 17, 2013 at 11:45:34AM +0100, Jan Kiszka wrote:
> > > > > > On 2013-03-17 09:47, Gleb Natapov wrote:
> > > > > > > On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
> > > > > > >> From: Jan Kiszka <jan.kiszka@siemens.com>
> > > > > > >>
> > > > > > >> If the guest didn't take the last APIC timer interrupt yet and generates
> > > > > > >> another one on top, e.g. via periodic mode, we do not block the VCPU
> > > > > > >> even if the guest state is halted. The reason is that
> > > > > > >> apic_has_pending_timer continues to return a non-zero value.
> > > > > > >>
> > > > > > >> Fix this busy loop by taking the IRR content for the LVT vector in
> > > > > > >> apic_has_pending_timer into account.
> > > > > > >>
> > > > > > > Just drop coalescing tacking for lapic interrupt. After posted interrupt
> > > > > > > will be merged __apic_accept_irq() will not longer return coalescing
> > > > > > > information, so the code will be dead anyway.
> > > > > > 
> > > > > > That requires the RTC decoalescing series to go first to avoid a
> > > > > > regression, no? Then let's postpone this topic for now.
> > > > > > 
> > > > > Yes, but decoalescing will work only for RTC :(
> > > > 
> > > > Are you proposing to drop LAPIC interrupt reinjection? 
> > > 
> > > Since timer handling and injection is VCPU-local for LAPIC,
> > > __apic_accept_irq can (and must) return coalesced information (cannot
> > > drop LAPIC interrupt reinjection).
> > > 
> > Why can't we drop LAPIC interrupt reinjection? Proposed posted interrupt
> > patches do not properly check for interrupt coalescing even for
> > VCPU-local injection.
> > 
> > --
> > 			Gleb.
> 
> Because older Linux guests depend on reinjection for proper timekeeping.
Which versions? Those without kvmclock? Can we make them use PIT instead?
Posted interrupts going to break them.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti March 21, 2013, 2:02 p.m. UTC | #9
On Thu, Mar 21, 2013 at 06:54:46AM +0200, Gleb Natapov wrote:
> On Wed, Mar 20, 2013 at 08:19:13PM -0300, Marcelo Tosatti wrote:
> > On Wed, Mar 20, 2013 at 11:32:38PM +0200, Gleb Natapov wrote:
> > > On Wed, Mar 20, 2013 at 05:03:19PM -0300, Marcelo Tosatti wrote:
> > > > On Wed, Mar 20, 2013 at 04:30:33PM -0300, Marcelo Tosatti wrote:
> > > > > On Sun, Mar 17, 2013 at 12:47:17PM +0200, Gleb Natapov wrote:
> > > > > > On Sun, Mar 17, 2013 at 11:45:34AM +0100, Jan Kiszka wrote:
> > > > > > > On 2013-03-17 09:47, Gleb Natapov wrote:
> > > > > > > > On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
> > > > > > > >> From: Jan Kiszka <jan.kiszka@siemens.com>
> > > > > > > >>
> > > > > > > >> If the guest didn't take the last APIC timer interrupt yet and generates
> > > > > > > >> another one on top, e.g. via periodic mode, we do not block the VCPU
> > > > > > > >> even if the guest state is halted. The reason is that
> > > > > > > >> apic_has_pending_timer continues to return a non-zero value.
> > > > > > > >>
> > > > > > > >> Fix this busy loop by taking the IRR content for the LVT vector in
> > > > > > > >> apic_has_pending_timer into account.
> > > > > > > >>
> > > > > > > > Just drop coalescing tacking for lapic interrupt. After posted interrupt
> > > > > > > > will be merged __apic_accept_irq() will not longer return coalescing
> > > > > > > > information, so the code will be dead anyway.
> > > > > > > 
> > > > > > > That requires the RTC decoalescing series to go first to avoid a
> > > > > > > regression, no? Then let's postpone this topic for now.
> > > > > > > 
> > > > > > Yes, but decoalescing will work only for RTC :(
> > > > > 
> > > > > Are you proposing to drop LAPIC interrupt reinjection? 
> > > > 
> > > > Since timer handling and injection is VCPU-local for LAPIC,
> > > > __apic_accept_irq can (and must) return coalesced information (cannot
> > > > drop LAPIC interrupt reinjection).
> > > > 
> > > Why can't we drop LAPIC interrupt reinjection? Proposed posted interrupt
> > > patches do not properly check for interrupt coalescing even for
> > > VCPU-local injection.
> > > 
> > > --
> > > 			Gleb.
> > 
> > Because older Linux guests depend on reinjection for proper timekeeping.
> Which versions? Those without kvmclock? Can we make them use PIT instead?
> Posted interrupts going to break them.

There is no reason to break them if its OK to receive reinjection info
from LAPIC... its a matter of returning the information from
apic_accept_irq, no big deal.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov March 21, 2013, 2:18 p.m. UTC | #10
On Thu, Mar 21, 2013 at 11:02:24AM -0300, Marcelo Tosatti wrote:
> On Thu, Mar 21, 2013 at 06:54:46AM +0200, Gleb Natapov wrote:
> > On Wed, Mar 20, 2013 at 08:19:13PM -0300, Marcelo Tosatti wrote:
> > > On Wed, Mar 20, 2013 at 11:32:38PM +0200, Gleb Natapov wrote:
> > > > On Wed, Mar 20, 2013 at 05:03:19PM -0300, Marcelo Tosatti wrote:
> > > > > On Wed, Mar 20, 2013 at 04:30:33PM -0300, Marcelo Tosatti wrote:
> > > > > > On Sun, Mar 17, 2013 at 12:47:17PM +0200, Gleb Natapov wrote:
> > > > > > > On Sun, Mar 17, 2013 at 11:45:34AM +0100, Jan Kiszka wrote:
> > > > > > > > On 2013-03-17 09:47, Gleb Natapov wrote:
> > > > > > > > > On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
> > > > > > > > >> From: Jan Kiszka <jan.kiszka@siemens.com>
> > > > > > > > >>
> > > > > > > > >> If the guest didn't take the last APIC timer interrupt yet and generates
> > > > > > > > >> another one on top, e.g. via periodic mode, we do not block the VCPU
> > > > > > > > >> even if the guest state is halted. The reason is that
> > > > > > > > >> apic_has_pending_timer continues to return a non-zero value.
> > > > > > > > >>
> > > > > > > > >> Fix this busy loop by taking the IRR content for the LVT vector in
> > > > > > > > >> apic_has_pending_timer into account.
> > > > > > > > >>
> > > > > > > > > Just drop coalescing tacking for lapic interrupt. After posted interrupt
> > > > > > > > > will be merged __apic_accept_irq() will not longer return coalescing
> > > > > > > > > information, so the code will be dead anyway.
> > > > > > > > 
> > > > > > > > That requires the RTC decoalescing series to go first to avoid a
> > > > > > > > regression, no? Then let's postpone this topic for now.
> > > > > > > > 
> > > > > > > Yes, but decoalescing will work only for RTC :(
> > > > > > 
> > > > > > Are you proposing to drop LAPIC interrupt reinjection? 
> > > > > 
> > > > > Since timer handling and injection is VCPU-local for LAPIC,
> > > > > __apic_accept_irq can (and must) return coalesced information (cannot
> > > > > drop LAPIC interrupt reinjection).
> > > > > 
> > > > Why can't we drop LAPIC interrupt reinjection? Proposed posted interrupt
> > > > patches do not properly check for interrupt coalescing even for
> > > > VCPU-local injection.
> > > > 
> > > > --
> > > > 			Gleb.
> > > 
> > > Because older Linux guests depend on reinjection for proper timekeeping.
> > Which versions? Those without kvmclock? Can we make them use PIT instead?
> > Posted interrupts going to break them.
> 
> There is no reason to break them if its OK to receive reinjection info
> from LAPIC... its a matter of returning the information from
> apic_accept_irq, no big deal.
> 
But current PI patches do break them, thats my point. So we either
need to revise them again, or drop LAPIC timer reinjection. Making
apic_accept_irq semantics "it returns coalescing info, but only sometimes"
is dubious though.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z March 21, 2013, 2:27 p.m. UTC | #11
Gleb Natapov wrote on 2013-03-21:
> On Thu, Mar 21, 2013 at 11:02:24AM -0300, Marcelo Tosatti wrote:
>> On Thu, Mar 21, 2013 at 06:54:46AM +0200, Gleb Natapov wrote:
>>> On Wed, Mar 20, 2013 at 08:19:13PM -0300, Marcelo Tosatti wrote:
>>>> On Wed, Mar 20, 2013 at 11:32:38PM +0200, Gleb Natapov wrote:
>>>>> On Wed, Mar 20, 2013 at 05:03:19PM -0300, Marcelo Tosatti wrote:
>>>>>> On Wed, Mar 20, 2013 at 04:30:33PM -0300, Marcelo Tosatti wrote:
>>>>>>> On Sun, Mar 17, 2013 at 12:47:17PM +0200, Gleb Natapov wrote:
>>>>>>>> On Sun, Mar 17, 2013 at 11:45:34AM +0100, Jan Kiszka wrote:
>>>>>>>>> On 2013-03-17 09:47, Gleb Natapov wrote:
>>>>>>>>>> On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
>>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>>>>> 
>>>>>>>>>>> If the guest didn't take the last APIC timer interrupt yet and
>>>>>>>>>>> generates another one on top, e.g. via periodic mode, we do
>>>>>>>>>>> not block the VCPU even if the guest state is halted. The
>>>>>>>>>>> reason is that apic_has_pending_timer continues to return a
>>>>>>>>>>> non-zero value.
>>>>>>>>>>> 
>>>>>>>>>>> Fix this busy loop by taking the IRR content for the LVT vector in
>>>>>>>>>>> apic_has_pending_timer into account.
>>>>>>>>>>> 
>>>>>>>>>> Just drop coalescing tacking for lapic interrupt. After posted
>>>>>>>>>> interrupt will be merged __apic_accept_irq() will not longer
>>>>>>>>>> return coalescing information, so the code will be dead anyway.
>>>>>>>>> 
>>>>>>>>> That requires the RTC decoalescing series to go first to avoid a
>>>>>>>>> regression, no? Then let's postpone this topic for now.
>>>>>>>>> 
>>>>>>>> Yes, but decoalescing will work only for RTC :(
>>>>>>> 
>>>>>>> Are you proposing to drop LAPIC interrupt reinjection?
>>>>>> 
>>>>>> Since timer handling and injection is VCPU-local for LAPIC,
>>>>>> __apic_accept_irq can (and must) return coalesced information (cannot
>>>>>> drop LAPIC interrupt reinjection).
>>>>>> 
>>>>> Why can't we drop LAPIC interrupt reinjection? Proposed posted
>>>>> interrupt patches do not properly check for interrupt coalescing
>>>>> even for VCPU-local injection.
>>>>> 
>>>>> --
>>>>> 			Gleb.
>>>> 
>>>> Because older Linux guests depend on reinjection for proper timekeeping.
>>> Which versions? Those without kvmclock? Can we make them use PIT
>>> instead? Posted interrupts going to break them.
>> 
>> There is no reason to break them if its OK to receive reinjection info
>> from LAPIC... its a matter of returning the information from
>> apic_accept_irq, no big deal.
>> 
> But current PI patches do break them, thats my point. So we either
> need to revise them again, or drop LAPIC timer reinjection. Making
> apic_accept_irq semantics "it returns coalescing info, but only sometimes"
> is dubious though.
We may rollback to the initial idea: test both irr and pir to get coalescing info. In this case, inject LAPIC timer always in vcpu context. So apic_accept_irq() will return right coalescing info.
Also, we need to add comments to tell caller, apic_accept_irq() can ensure the return value is correct only when caller is in target vcpu context.

Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov March 21, 2013, 4:27 p.m. UTC | #12
On Thu, Mar 21, 2013 at 02:27:22PM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2013-03-21:
> > On Thu, Mar 21, 2013 at 11:02:24AM -0300, Marcelo Tosatti wrote:
> >> On Thu, Mar 21, 2013 at 06:54:46AM +0200, Gleb Natapov wrote:
> >>> On Wed, Mar 20, 2013 at 08:19:13PM -0300, Marcelo Tosatti wrote:
> >>>> On Wed, Mar 20, 2013 at 11:32:38PM +0200, Gleb Natapov wrote:
> >>>>> On Wed, Mar 20, 2013 at 05:03:19PM -0300, Marcelo Tosatti wrote:
> >>>>>> On Wed, Mar 20, 2013 at 04:30:33PM -0300, Marcelo Tosatti wrote:
> >>>>>>> On Sun, Mar 17, 2013 at 12:47:17PM +0200, Gleb Natapov wrote:
> >>>>>>>> On Sun, Mar 17, 2013 at 11:45:34AM +0100, Jan Kiszka wrote:
> >>>>>>>>> On 2013-03-17 09:47, Gleb Natapov wrote:
> >>>>>>>>>> On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
> >>>>>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>>>>>>> 
> >>>>>>>>>>> If the guest didn't take the last APIC timer interrupt yet and
> >>>>>>>>>>> generates another one on top, e.g. via periodic mode, we do
> >>>>>>>>>>> not block the VCPU even if the guest state is halted. The
> >>>>>>>>>>> reason is that apic_has_pending_timer continues to return a
> >>>>>>>>>>> non-zero value.
> >>>>>>>>>>> 
> >>>>>>>>>>> Fix this busy loop by taking the IRR content for the LVT vector in
> >>>>>>>>>>> apic_has_pending_timer into account.
> >>>>>>>>>>> 
> >>>>>>>>>> Just drop coalescing tacking for lapic interrupt. After posted
> >>>>>>>>>> interrupt will be merged __apic_accept_irq() will not longer
> >>>>>>>>>> return coalescing information, so the code will be dead anyway.
> >>>>>>>>> 
> >>>>>>>>> That requires the RTC decoalescing series to go first to avoid a
> >>>>>>>>> regression, no? Then let's postpone this topic for now.
> >>>>>>>>> 
> >>>>>>>> Yes, but decoalescing will work only for RTC :(
> >>>>>>> 
> >>>>>>> Are you proposing to drop LAPIC interrupt reinjection?
> >>>>>> 
> >>>>>> Since timer handling and injection is VCPU-local for LAPIC,
> >>>>>> __apic_accept_irq can (and must) return coalesced information (cannot
> >>>>>> drop LAPIC interrupt reinjection).
> >>>>>> 
> >>>>> Why can't we drop LAPIC interrupt reinjection? Proposed posted
> >>>>> interrupt patches do not properly check for interrupt coalescing
> >>>>> even for VCPU-local injection.
> >>>>> 
> >>>>> --
> >>>>> 			Gleb.
> >>>> 
> >>>> Because older Linux guests depend on reinjection for proper timekeeping.
> >>> Which versions? Those without kvmclock? Can we make them use PIT
> >>> instead? Posted interrupts going to break them.
> >> 
> >> There is no reason to break them if its OK to receive reinjection info
> >> from LAPIC... its a matter of returning the information from
> >> apic_accept_irq, no big deal.
> >> 
> > But current PI patches do break them, thats my point. So we either
> > need to revise them again, or drop LAPIC timer reinjection. Making
> > apic_accept_irq semantics "it returns coalescing info, but only sometimes"
> > is dubious though.
> We may rollback to the initial idea: test both irr and pir to get coalescing info. In this case, inject LAPIC timer always in vcpu context. So apic_accept_irq() will return right coalescing info.
> Also, we need to add comments to tell caller, apic_accept_irq() can ensure the return value is correct only when caller is in target vcpu context.
> 
We cannot touch irr while vcpu is in non-root operation, so we will have
to pass flag to apic_accept_irq() to let it know that it is called
synchronously. While all this is possible I want to know which guests
exactly will we break if we will not track interrupt coalescing for
lapic timer. If only 2.0 smp kernels will break we can probably drop it.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti March 21, 2013, 8:51 p.m. UTC | #13
> > > But current PI patches do break them, thats my point. So we either
> > > need to revise them again, or drop LAPIC timer reinjection. Making
> > > apic_accept_irq semantics "it returns coalescing info, but only sometimes"
> > > is dubious though.
> > We may rollback to the initial idea: test both irr and pir to get coalescing info. In this case, inject LAPIC timer always in vcpu context. So apic_accept_irq() will return right coalescing info.
> > Also, we need to add comments to tell caller, apic_accept_irq() can ensure the return value is correct only when caller is in target vcpu context.
> > 
> We cannot touch irr while vcpu is in non-root operation, so we will have
> to pass flag to apic_accept_irq() to let it know that it is called
> synchronously. While all this is possible I want to know which guests
> exactly will we break if we will not track interrupt coalescing for
> lapic timer. If only 2.0 smp kernels will break we can probably drop it.

RHEL4 / RHEL5 guests.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov March 21, 2013, 9:13 p.m. UTC | #14
On Thu, Mar 21, 2013 at 05:51:50PM -0300, Marcelo Tosatti wrote:
> > > > But current PI patches do break them, thats my point. So we either
> > > > need to revise them again, or drop LAPIC timer reinjection. Making
> > > > apic_accept_irq semantics "it returns coalescing info, but only sometimes"
> > > > is dubious though.
> > > We may rollback to the initial idea: test both irr and pir to get coalescing info. In this case, inject LAPIC timer always in vcpu context. So apic_accept_irq() will return right coalescing info.
> > > Also, we need to add comments to tell caller, apic_accept_irq() can ensure the return value is correct only when caller is in target vcpu context.
> > > 
> > We cannot touch irr while vcpu is in non-root operation, so we will have
> > to pass flag to apic_accept_irq() to let it know that it is called
> > synchronously. While all this is possible I want to know which guests
> > exactly will we break if we will not track interrupt coalescing for
> > lapic timer. If only 2.0 smp kernels will break we can probably drop it.
> 
> RHEL4 / RHEL5 guests.
RHEL5 has kvmclock no? We should not break RHEL4 though.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti March 21, 2013, 11:06 p.m. UTC | #15
On Thu, Mar 21, 2013 at 11:13:39PM +0200, Gleb Natapov wrote:
> On Thu, Mar 21, 2013 at 05:51:50PM -0300, Marcelo Tosatti wrote:
> > > > > But current PI patches do break them, thats my point. So we either
> > > > > need to revise them again, or drop LAPIC timer reinjection. Making
> > > > > apic_accept_irq semantics "it returns coalescing info, but only sometimes"
> > > > > is dubious though.
> > > > We may rollback to the initial idea: test both irr and pir to get coalescing info. In this case, inject LAPIC timer always in vcpu context. So apic_accept_irq() will return right coalescing info.
> > > > Also, we need to add comments to tell caller, apic_accept_irq() can ensure the return value is correct only when caller is in target vcpu context.
> > > > 
> > > We cannot touch irr while vcpu is in non-root operation, so we will have
> > > to pass flag to apic_accept_irq() to let it know that it is called
> > > synchronously. While all this is possible I want to know which guests
> > > exactly will we break if we will not track interrupt coalescing for
> > > lapic timer. If only 2.0 smp kernels will break we can probably drop it.
> > 
> > RHEL4 / RHEL5 guests.
> RHEL5 has kvmclock no? We should not break RHEL4 though.

kvmclock provides no timer interrupt... either LAPIC or PIT must be used
with kvmclock.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z March 22, 2013, 1:50 a.m. UTC | #16
Marcelo Tosatti wrote on 2013-03-22:
> On Thu, Mar 21, 2013 at 11:13:39PM +0200, Gleb Natapov wrote:
>> On Thu, Mar 21, 2013 at 05:51:50PM -0300, Marcelo Tosatti wrote:
>>>>>> But current PI patches do break them, thats my point. So we either
>>>>>> need to revise them again, or drop LAPIC timer reinjection. Making
>>>>>> apic_accept_irq semantics "it returns coalescing info, but only
>>>>>> sometimes" is dubious though.
>>>>> We may rollback to the initial idea: test both irr and pir to get coalescing
> info. In this case, inject LAPIC timer always in vcpu context. So apic_accept_irq()
> will return right coalescing info.
>>>>> Also, we need to add comments to tell caller, apic_accept_irq() can
>>>>> ensure the return value is correct only when caller is in target
>>>>> vcpu context.
>>>>> 
>>>> We cannot touch irr while vcpu is in non-root operation, so we will have
>>>> to pass flag to apic_accept_irq() to let it know that it is called
>>>> synchronously. While all this is possible I want to know which guests
>>>> exactly will we break if we will not track interrupt coalescing for
>>>> lapic timer. If only 2.0 smp kernels will break we can probably drop it.
>>> 
>>> RHEL4 / RHEL5 guests.
>> RHEL5 has kvmclock no? We should not break RHEL4 though.
> 
> kvmclock provides no timer interrupt... either LAPIC or PIT must be used
> with kvmclock.
Ok, Here is the conclusion: 
-- According Marcelo's comments, RHEL4/RHEL5 rely on precise LAPIC timer injection. So LAPIC timer injection logic is necessary.
--LAPIC timer injection always occurred in vcpu context, so it's safe to touch irr and pir for LAPIC timer injection.
--We cannot touch virtual apic page while vcpu is in non-root operation, so the best solution is pass a flag to apic_accept_irq and check whether it's safe to touch vIRR according this flag.

Right?

Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov March 22, 2013, 6:53 a.m. UTC | #17
On Thu, Mar 21, 2013 at 08:06:41PM -0300, Marcelo Tosatti wrote:
> On Thu, Mar 21, 2013 at 11:13:39PM +0200, Gleb Natapov wrote:
> > On Thu, Mar 21, 2013 at 05:51:50PM -0300, Marcelo Tosatti wrote:
> > > > > > But current PI patches do break them, thats my point. So we either
> > > > > > need to revise them again, or drop LAPIC timer reinjection. Making
> > > > > > apic_accept_irq semantics "it returns coalescing info, but only sometimes"
> > > > > > is dubious though.
> > > > > We may rollback to the initial idea: test both irr and pir to get coalescing info. In this case, inject LAPIC timer always in vcpu context. So apic_accept_irq() will return right coalescing info.
> > > > > Also, we need to add comments to tell caller, apic_accept_irq() can ensure the return value is correct only when caller is in target vcpu context.
> > > > > 
> > > > We cannot touch irr while vcpu is in non-root operation, so we will have
> > > > to pass flag to apic_accept_irq() to let it know that it is called
> > > > synchronously. While all this is possible I want to know which guests
> > > > exactly will we break if we will not track interrupt coalescing for
> > > > lapic timer. If only 2.0 smp kernels will break we can probably drop it.
> > > 
> > > RHEL4 / RHEL5 guests.
> > RHEL5 has kvmclock no? We should not break RHEL4 though.
> 
> kvmclock provides no timer interrupt... either LAPIC or PIT must be used
> with kvmclock.
I am confused now. If LAPIC is not used for wallclock time keeping, but
only for scheduling the reinjection is actually harmful. Reinjecting the
interrupt will cause needles task rescheduling. So the question is if
there is a Linux kernel that uses LAPIC for wallclock time keeping and
relies on accurate number of injected interrupts to not time drift.
Knowing that Linux tend to disable interrupt it is likely that it tries
to detect and compensate for missing interrupt.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti March 22, 2013, 10:43 a.m. UTC | #18
On Fri, Mar 22, 2013 at 08:53:15AM +0200, Gleb Natapov wrote:
> On Thu, Mar 21, 2013 at 08:06:41PM -0300, Marcelo Tosatti wrote:
> > On Thu, Mar 21, 2013 at 11:13:39PM +0200, Gleb Natapov wrote:
> > > On Thu, Mar 21, 2013 at 05:51:50PM -0300, Marcelo Tosatti wrote:
> > > > > > > But current PI patches do break them, thats my point. So we either
> > > > > > > need to revise them again, or drop LAPIC timer reinjection. Making
> > > > > > > apic_accept_irq semantics "it returns coalescing info, but only sometimes"
> > > > > > > is dubious though.
> > > > > > We may rollback to the initial idea: test both irr and pir to get coalescing info. In this case, inject LAPIC timer always in vcpu context. So apic_accept_irq() will return right coalescing info.
> > > > > > Also, we need to add comments to tell caller, apic_accept_irq() can ensure the return value is correct only when caller is in target vcpu context.
> > > > > > 
> > > > > We cannot touch irr while vcpu is in non-root operation, so we will have
> > > > > to pass flag to apic_accept_irq() to let it know that it is called
> > > > > synchronously. While all this is possible I want to know which guests
> > > > > exactly will we break if we will not track interrupt coalescing for
> > > > > lapic timer. If only 2.0 smp kernels will break we can probably drop it.
> > > > 
> > > > RHEL4 / RHEL5 guests.
> > > RHEL5 has kvmclock no? We should not break RHEL4 though.
> > 
> > kvmclock provides no timer interrupt... either LAPIC or PIT must be used
> > with kvmclock.
> I am confused now. If LAPIC is not used for wallclock time keeping, but
> only for scheduling the reinjection is actually harmful. Reinjecting the
> interrupt will cause needles task rescheduling. So the question is if
> there is a Linux kernel that uses LAPIC for wallclock time keeping and
> relies on accurate number of injected interrupts to not time drift.

See 4acd47cfea9c18134e0cbf915780892ef0ff433a on RHEL5, RHEL5 kernels before that
commit did not reinject.  Which means that all non-RHEL Linux guests
based on that upstream code also suffer from the same problem.

Also any other algorithm which uses LAPIC timers and compare that with
other clocks (such as NMI watchdog) are potentially vulnerable.

Can drop it, and then wait until someone complains (if so).

> Knowing that Linux tend to disable interrupt it is likely that it tries
> to detect and compensate for missing interrupt.

As said above, any algorithm which compares LAPIC timer interrupt with
another clock is vulnerable.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov March 22, 2013, 11:19 a.m. UTC | #19
On Fri, Mar 22, 2013 at 07:43:03AM -0300, Marcelo Tosatti wrote:
> On Fri, Mar 22, 2013 at 08:53:15AM +0200, Gleb Natapov wrote:
> > On Thu, Mar 21, 2013 at 08:06:41PM -0300, Marcelo Tosatti wrote:
> > > On Thu, Mar 21, 2013 at 11:13:39PM +0200, Gleb Natapov wrote:
> > > > On Thu, Mar 21, 2013 at 05:51:50PM -0300, Marcelo Tosatti wrote:
> > > > > > > > But current PI patches do break them, thats my point. So we either
> > > > > > > > need to revise them again, or drop LAPIC timer reinjection. Making
> > > > > > > > apic_accept_irq semantics "it returns coalescing info, but only sometimes"
> > > > > > > > is dubious though.
> > > > > > > We may rollback to the initial idea: test both irr and pir to get coalescing info. In this case, inject LAPIC timer always in vcpu context. So apic_accept_irq() will return right coalescing info.
> > > > > > > Also, we need to add comments to tell caller, apic_accept_irq() can ensure the return value is correct only when caller is in target vcpu context.
> > > > > > > 
> > > > > > We cannot touch irr while vcpu is in non-root operation, so we will have
> > > > > > to pass flag to apic_accept_irq() to let it know that it is called
> > > > > > synchronously. While all this is possible I want to know which guests
> > > > > > exactly will we break if we will not track interrupt coalescing for
> > > > > > lapic timer. If only 2.0 smp kernels will break we can probably drop it.
> > > > > 
> > > > > RHEL4 / RHEL5 guests.
> > > > RHEL5 has kvmclock no? We should not break RHEL4 though.
> > > 
> > > kvmclock provides no timer interrupt... either LAPIC or PIT must be used
> > > with kvmclock.
> > I am confused now. If LAPIC is not used for wallclock time keeping, but
> > only for scheduling the reinjection is actually harmful. Reinjecting the
> > interrupt will cause needles task rescheduling. So the question is if
> > there is a Linux kernel that uses LAPIC for wallclock time keeping and
> > relies on accurate number of injected interrupts to not time drift.
> 
> See 4acd47cfea9c18134e0cbf915780892ef0ff433a on RHEL5, RHEL5 kernels before that
> commit did not reinject.  Which means that all non-RHEL Linux guests
> based on that upstream code also suffer from the same problem.
> 
The commit actually fixes guest, not host. The existence of the commit
also means that LAPIC timer reinjection does not solve the problem and
all guests without this commit will suffer from the bug regardless of
what we will decide to do here. Without LAPIC timer reinfection the
effect of the bug will be much more visible and long lasting though.

> Also any other algorithm which uses LAPIC timers and compare that with
> other clocks (such as NMI watchdog) are potentially vulnerable.
They are with or without timer reinjection as commit you pointed to
shows.

> 
> Can drop it, and then wait until someone complains (if so).
> 
Yes, tough decision to make. All the complains will be guest bugs which
can be hit without reinjection too, but with less probability. Why we so
keen on keeping RTC reinject is that the guests that depends on it
cannot be fixed.

> > Knowing that Linux tend to disable interrupt it is likely that it tries
> > to detect and compensate for missing interrupt.
> 
> As said above, any algorithm which compares LAPIC timer interrupt with
> another clock is vulnerable.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z March 24, 2013, 10:45 a.m. UTC | #20
Gleb Natapov wrote on 2013-03-22:
> On Fri, Mar 22, 2013 at 07:43:03AM -0300, Marcelo Tosatti wrote:
>> On Fri, Mar 22, 2013 at 08:53:15AM +0200, Gleb Natapov wrote:
>>> On Thu, Mar 21, 2013 at 08:06:41PM -0300, Marcelo Tosatti wrote:
>>>> On Thu, Mar 21, 2013 at 11:13:39PM +0200, Gleb Natapov wrote:
>>>>> On Thu, Mar 21, 2013 at 05:51:50PM -0300, Marcelo Tosatti wrote:
>>>>>>>>> But current PI patches do break them, thats my point. So we
>>>>>>>>> either need to revise them again, or drop LAPIC timer
>>>>>>>>> reinjection. Making apic_accept_irq semantics "it returns
>>>>>>>>> coalescing info, but only sometimes" is dubious though.
>>>>>>>> We may rollback to the initial idea: test both irr and pir to get
> coalescing info. In this case, inject LAPIC timer always in vcpu context. So
> apic_accept_irq() will return right coalescing info.
>>>>>>>> Also, we need to add comments to tell caller, apic_accept_irq()
>>>>>>>> can ensure the return value is correct only when caller is in
>>>>>>>> target vcpu context.
>>>>>>>> 
>>>>>>> We cannot touch irr while vcpu is in non-root operation, so we
>>>>>>> will have to pass flag to apic_accept_irq() to let it know that it
>>>>>>> is called synchronously. While all this is possible I want to know
>>>>>>> which guests exactly will we break if we will not track interrupt
>>>>>>> coalescing for lapic timer. If only 2.0 smp kernels will break we
>>>>>>> can probably drop it.
>>>>>> 
>>>>>> RHEL4 / RHEL5 guests.
>>>>> RHEL5 has kvmclock no? We should not break RHEL4 though.
>>>> 
>>>> kvmclock provides no timer interrupt... either LAPIC or PIT must be used
>>>> with kvmclock.
>>> I am confused now. If LAPIC is not used for wallclock time keeping, but
>>> only for scheduling the reinjection is actually harmful. Reinjecting the
>>> interrupt will cause needles task rescheduling. So the question is if
>>> there is a Linux kernel that uses LAPIC for wallclock time keeping and
>>> relies on accurate number of injected interrupts to not time drift.
>> 
>> See 4acd47cfea9c18134e0cbf915780892ef0ff433a on RHEL5, RHEL5 kernels
>> before that commit did not reinject.  Which means that all non-RHEL
>> Linux guests based on that upstream code also suffer from the same
>> problem.
>> 
> The commit actually fixes guest, not host. The existence of the commit
> also means that LAPIC timer reinjection does not solve the problem and
> all guests without this commit will suffer from the bug regardless of
> what we will decide to do here. Without LAPIC timer reinfection the
> effect of the bug will be much more visible and long lasting though.
> 
>> Also any other algorithm which uses LAPIC timers and compare that with
>> other clocks (such as NMI watchdog) are potentially vulnerable.
> They are with or without timer reinjection as commit you pointed to
> shows.
> 
>> 
>> Can drop it, and then wait until someone complains (if so).
>> 
> Yes, tough decision to make. All the complains will be guest bugs which
> can be hit without reinjection too, but with less probability. Why we so
> keen on keeping RTC reinject is that the guests that depends on it
> cannot be fixed.
> 
>>> Knowing that Linux tend to disable interrupt it is likely that it tries
>>> to detect and compensate for missing interrupt.
>> 
>> As said above, any algorithm which compares LAPIC timer interrupt with
>> another clock is vulnerable.
Any conclusion? 

Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov March 24, 2013, 7:03 p.m. UTC | #21
On Sun, Mar 24, 2013 at 10:45:53AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2013-03-22:
> > On Fri, Mar 22, 2013 at 07:43:03AM -0300, Marcelo Tosatti wrote:
> >> On Fri, Mar 22, 2013 at 08:53:15AM +0200, Gleb Natapov wrote:
> >>> On Thu, Mar 21, 2013 at 08:06:41PM -0300, Marcelo Tosatti wrote:
> >>>> On Thu, Mar 21, 2013 at 11:13:39PM +0200, Gleb Natapov wrote:
> >>>>> On Thu, Mar 21, 2013 at 05:51:50PM -0300, Marcelo Tosatti wrote:
> >>>>>>>>> But current PI patches do break them, thats my point. So we
> >>>>>>>>> either need to revise them again, or drop LAPIC timer
> >>>>>>>>> reinjection. Making apic_accept_irq semantics "it returns
> >>>>>>>>> coalescing info, but only sometimes" is dubious though.
> >>>>>>>> We may rollback to the initial idea: test both irr and pir to get
> > coalescing info. In this case, inject LAPIC timer always in vcpu context. So
> > apic_accept_irq() will return right coalescing info.
> >>>>>>>> Also, we need to add comments to tell caller, apic_accept_irq()
> >>>>>>>> can ensure the return value is correct only when caller is in
> >>>>>>>> target vcpu context.
> >>>>>>>> 
> >>>>>>> We cannot touch irr while vcpu is in non-root operation, so we
> >>>>>>> will have to pass flag to apic_accept_irq() to let it know that it
> >>>>>>> is called synchronously. While all this is possible I want to know
> >>>>>>> which guests exactly will we break if we will not track interrupt
> >>>>>>> coalescing for lapic timer. If only 2.0 smp kernels will break we
> >>>>>>> can probably drop it.
> >>>>>> 
> >>>>>> RHEL4 / RHEL5 guests.
> >>>>> RHEL5 has kvmclock no? We should not break RHEL4 though.
> >>>> 
> >>>> kvmclock provides no timer interrupt... either LAPIC or PIT must be used
> >>>> with kvmclock.
> >>> I am confused now. If LAPIC is not used for wallclock time keeping, but
> >>> only for scheduling the reinjection is actually harmful. Reinjecting the
> >>> interrupt will cause needles task rescheduling. So the question is if
> >>> there is a Linux kernel that uses LAPIC for wallclock time keeping and
> >>> relies on accurate number of injected interrupts to not time drift.
> >> 
> >> See 4acd47cfea9c18134e0cbf915780892ef0ff433a on RHEL5, RHEL5 kernels
> >> before that commit did not reinject.  Which means that all non-RHEL
> >> Linux guests based on that upstream code also suffer from the same
> >> problem.
> >> 
> > The commit actually fixes guest, not host. The existence of the commit
> > also means that LAPIC timer reinjection does not solve the problem and
> > all guests without this commit will suffer from the bug regardless of
> > what we will decide to do here. Without LAPIC timer reinfection the
> > effect of the bug will be much more visible and long lasting though.
> > 
> >> Also any other algorithm which uses LAPIC timers and compare that with
> >> other clocks (such as NMI watchdog) are potentially vulnerable.
> > They are with or without timer reinjection as commit you pointed to
> > shows.
> > 
> >> 
> >> Can drop it, and then wait until someone complains (if so).
> >> 
> > Yes, tough decision to make. All the complains will be guest bugs which
> > can be hit without reinjection too, but with less probability. Why we so
> > keen on keeping RTC reinject is that the guests that depends on it
> > cannot be fixed.
> > 
> >>> Knowing that Linux tend to disable interrupt it is likely that it tries
> >>> to detect and compensate for missing interrupt.
> >> 
> >> As said above, any algorithm which compares LAPIC timer interrupt with
> >> another clock is vulnerable.
> Any conclusion? 
> 
Lets not check for coalescing in PI patches for now.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka April 28, 2013, 10:15 a.m. UTC | #22
On 2013-03-17 09:47, Gleb Natapov wrote:
> On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> If the guest didn't take the last APIC timer interrupt yet and generates
>> another one on top, e.g. via periodic mode, we do not block the VCPU
>> even if the guest state is halted. The reason is that
>> apic_has_pending_timer continues to return a non-zero value.
>>
>> Fix this busy loop by taking the IRR content for the LVT vector in
>> apic_has_pending_timer into account.
>>
> Just drop coalescing tacking for lapic interrupt. After posted interrupt
> will be merged __apic_accept_irq() will not longer return coalescing
> information, so the code will be dead anyway.

If I understood the follow-up discussion correctly, we aren't dropping
de-coalescing support yet. So how to proceed with this fix here?

Jan

> 
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>
>> Not a critical issue, we are looping fully interruptible, but it's ugly
>> to do so IMHO.
>>
>>  arch/x86/kvm/lapic.c |    4 +++-
>>  1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index a8e9369..658abf5 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -1473,7 +1473,9 @@ int apic_has_pending_timer(struct kvm_vcpu *vcpu)
>>  	struct kvm_lapic *apic = vcpu->arch.apic;
>>  
>>  	if (kvm_vcpu_has_lapic(vcpu) && apic_enabled(apic) &&
>> -			apic_lvt_enabled(apic, APIC_LVTT))
>> +	    apic_lvt_enabled(apic, APIC_LVTT) &&
>> +	    !apic_test_vector(apic_lvt_vector(apic, APIC_LVTT),
>> +					      apic->regs + APIC_IRR))
>>  		return atomic_read(&apic->lapic_timer.pending);
>>  
>>  	return 0;
>> -- 
>> 1.7.3.4
> 
> --
> 			Gleb.
Gleb Natapov April 28, 2013, 10:19 a.m. UTC | #23
On Sun, Apr 28, 2013 at 12:15:05PM +0200, Jan Kiszka wrote:
> On 2013-03-17 09:47, Gleb Natapov wrote:
> > On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
> >> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>
> >> If the guest didn't take the last APIC timer interrupt yet and generates
> >> another one on top, e.g. via periodic mode, we do not block the VCPU
> >> even if the guest state is halted. The reason is that
> >> apic_has_pending_timer continues to return a non-zero value.
> >>
> >> Fix this busy loop by taking the IRR content for the LVT vector in
> >> apic_has_pending_timer into account.
> >>
> > Just drop coalescing tacking for lapic interrupt. After posted interrupt
> > will be merged __apic_accept_irq() will not longer return coalescing
> > information, so the code will be dead anyway.
> 
> If I understood the follow-up discussion correctly, we aren't dropping
> de-coalescing support yet. So how to proceed with this fix here?
> 
We do. It does not work if you run on CPU with apicv support already.

> Jan
> 
> > 
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> ---
> >>
> >> Not a critical issue, we are looping fully interruptible, but it's ugly
> >> to do so IMHO.
> >>
> >>  arch/x86/kvm/lapic.c |    4 +++-
> >>  1 files changed, 3 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >> index a8e9369..658abf5 100644
> >> --- a/arch/x86/kvm/lapic.c
> >> +++ b/arch/x86/kvm/lapic.c
> >> @@ -1473,7 +1473,9 @@ int apic_has_pending_timer(struct kvm_vcpu *vcpu)
> >>  	struct kvm_lapic *apic = vcpu->arch.apic;
> >>  
> >>  	if (kvm_vcpu_has_lapic(vcpu) && apic_enabled(apic) &&
> >> -			apic_lvt_enabled(apic, APIC_LVTT))
> >> +	    apic_lvt_enabled(apic, APIC_LVTT) &&
> >> +	    !apic_test_vector(apic_lvt_vector(apic, APIC_LVTT),
> >> +					      apic->regs + APIC_IRR))
> >>  		return atomic_read(&apic->lapic_timer.pending);
> >>  
> >>  	return 0;
> >> -- 
> >> 1.7.3.4
> > 
> > --
> > 			Gleb.
> 
> 



--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka April 28, 2013, 10:20 a.m. UTC | #24
On 2013-04-28 12:19, Gleb Natapov wrote:
> On Sun, Apr 28, 2013 at 12:15:05PM +0200, Jan Kiszka wrote:
>> On 2013-03-17 09:47, Gleb Natapov wrote:
>>> On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>
>>>> If the guest didn't take the last APIC timer interrupt yet and generates
>>>> another one on top, e.g. via periodic mode, we do not block the VCPU
>>>> even if the guest state is halted. The reason is that
>>>> apic_has_pending_timer continues to return a non-zero value.
>>>>
>>>> Fix this busy loop by taking the IRR content for the LVT vector in
>>>> apic_has_pending_timer into account.
>>>>
>>> Just drop coalescing tacking for lapic interrupt. After posted interrupt
>>> will be merged __apic_accept_irq() will not longer return coalescing
>>> information, so the code will be dead anyway.
>>
>> If I understood the follow-up discussion correctly, we aren't dropping
>> de-coalescing support yet. So how to proceed with this fix here?
>>
> We do. It does not work if you run on CPU with apicv support already.

But isn't the code still there and working when apicv is absent?

Jan
Gleb Natapov April 28, 2013, 10:23 a.m. UTC | #25
On Sun, Apr 28, 2013 at 12:20:12PM +0200, Jan Kiszka wrote:
> On 2013-04-28 12:19, Gleb Natapov wrote:
> > On Sun, Apr 28, 2013 at 12:15:05PM +0200, Jan Kiszka wrote:
> >> On 2013-03-17 09:47, Gleb Natapov wrote:
> >>> On Sat, Mar 16, 2013 at 09:49:07PM +0100, Jan Kiszka wrote:
> >>>> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>
> >>>> If the guest didn't take the last APIC timer interrupt yet and generates
> >>>> another one on top, e.g. via periodic mode, we do not block the VCPU
> >>>> even if the guest state is halted. The reason is that
> >>>> apic_has_pending_timer continues to return a non-zero value.
> >>>>
> >>>> Fix this busy loop by taking the IRR content for the LVT vector in
> >>>> apic_has_pending_timer into account.
> >>>>
> >>> Just drop coalescing tacking for lapic interrupt. After posted interrupt
> >>> will be merged __apic_accept_irq() will not longer return coalescing
> >>> information, so the code will be dead anyway.
> >>
> >> If I understood the follow-up discussion correctly, we aren't dropping
> >> de-coalescing support yet. So how to proceed with this fix here?
> >>
> > We do. It does not work if you run on CPU with apicv support already.
> 
> But isn't the code still there and working when apicv is absent?
> 
Remove it as a fix for busy loop. It is not a good idea to behave differently on different
types of hardware.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a8e9369..658abf5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1473,7 +1473,9 @@  int apic_has_pending_timer(struct kvm_vcpu *vcpu)
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
 	if (kvm_vcpu_has_lapic(vcpu) && apic_enabled(apic) &&
-			apic_lvt_enabled(apic, APIC_LVTT))
+	    apic_lvt_enabled(apic, APIC_LVTT) &&
+	    !apic_test_vector(apic_lvt_vector(apic, APIC_LVTT),
+					      apic->regs + APIC_IRR))
 		return atomic_read(&apic->lapic_timer.pending);
 
 	return 0;