Implement generic double fault generation mechanism

Message ID	9832F13BD22FB94A829F798DA4A8280501A81A8E83@pdsmsx503.ccr.corp.intel.com (mailing list archive)
State	New, archived
Headers	show Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n488TO71024540 for <patchwork-kvm@patchwork.kernel.org>; Fri, 8 May 2009 08:29:24 GMT From: "Dong, Eddie" <eddie.dong@intel.com> To: Gleb Natapov <gleb@redhat.com> CC: "kvm@vger.kernel.org" <kvm@vger.kernel.org>, Avi Kivity <avi@redhat.com>, "Dong, Eddie" <eddie.dong@intel.com> Date: Fri, 8 May 2009 16:27:28 +0800 Subject: RE: Implement generic double fault generation mechanism Thread-Topic: Implement generic double fault generation mechanism Thread-Index: AcnL3WhbpXMgMiNvRWGUyI26/a2tHAD2F6sw Message-ID: <9832F13BD22FB94A829F798DA4A8280501A81A8E83@pdsmsx503.ccr.corp.intel.com> References: <9832F13BD22FB94A829F798DA4A8280501A80F02A3@pdsmsx503.ccr.corp.intel.com> <20090503105330.GL9795@redhat.com> In-Reply-To: <20090503105330.GL9795@redhat.com> Accept-Language: en-US Content-Language: en-US Content-Type: multipart/mixed; boundary="_002_9832F13BD22FB94A829F798DA4A8280501A81A8E83pdsmsx503ccrc_" MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk

Dong, Eddie May 8, 2009, 8:27 a.m. UTC

Gleb Natapov wrote:
>> +
>> +static int exception_class(int vector)
>> +{
>> +	if (vector == 14)
>> +		return EXCPT_PF;
>> +	else if (vector == 0 || (vector >= 10 && vector <= 13)) +		return
>> EXCPT_CONTRIBUTORY; +	else
>> +		return EXCPT_BENIGN;
>> +}
>> +
> This makes double fault (8) benign exception. Surely not what you
> want. 

double fault fall into neither of above class per SDM. But it should be 
checked earlier than generating DB fault. See new updated.
>> +	/* to check exception */
>> +	prev_nr = vcpu->arch.exception.nr;
>> +	class2 = exception_class(nr);
>> +	class1 = exception_class(prev_nr);
>> +	if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY)
>> +		|| (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) {
>> +		/* generate double fault per SDM Table 5-5 */
>> +		printk(KERN_DEBUG "kvm: double fault 0x%x on 0x%x\n",
>> +			prev_nr, nr); +		vcpu->arch.exception.pending = true;
>> +		vcpu->arch.exception.has_error_code = 1;
>> +		vcpu->arch.exception.nr = DF_VECTOR;
>> +		vcpu->arch.exception.error_code = 0;
>> +		if (prev_nr == DF_VECTOR) {
>> +			/* triple fault -> shutdown */
>> +			set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); +		}
>> +	} else
>> +		printk(KERN_ERR "Exception 0x%x on 0x%x happens serially\n",
>> +			prev_nr, nr); +}
> When two exceptions happens serially is is better to replace pending
> exception with a new one. This way the first exception (that is lost)
> will be regenerated when instruction will be re-executed.

Do you want it to be covered for now? For exception, it is easy but for IRQ, it needs to be pushed back.

Thx, eddie



    Move Double-Fault generation logic out of page fault
    exception generating function to cover more generic case.
    
    Signed-off-by: Eddie Dong <eddie.dong@intel.com>

Gleb Natapov May 8, 2009, 9:53 a.m. UTC | #1

On Fri, May 08, 2009 at 04:27:28PM +0800, Dong, Eddie wrote:
> Gleb Natapov wrote:
> >> +
> >> +static int exception_class(int vector)
> >> +{
> >> +	if (vector == 14)
> >> +		return EXCPT_PF;
> >> +	else if (vector == 0 || (vector >= 10 && vector <= 13)) +		return
> >> EXCPT_CONTRIBUTORY; +	else
> >> +		return EXCPT_BENIGN;
> >> +}
> >> +
> > This makes double fault (8) benign exception. Surely not what you
> > want. 
> 
> double fault fall into neither of above class per SDM. But it should be 
> checked earlier than generating DB fault. See new updated.
> >> +	/* to check exception */
> >> +	prev_nr = vcpu->arch.exception.nr;
> >> +	class2 = exception_class(nr);
> >> +	class1 = exception_class(prev_nr);
> >> +	if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY)
> >> +		|| (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) {
> >> +		/* generate double fault per SDM Table 5-5 */
> >> +		printk(KERN_DEBUG "kvm: double fault 0x%x on 0x%x\n",
> >> +			prev_nr, nr); +		vcpu->arch.exception.pending = true;
> >> +		vcpu->arch.exception.has_error_code = 1;
> >> +		vcpu->arch.exception.nr = DF_VECTOR;
> >> +		vcpu->arch.exception.error_code = 0;
> >> +		if (prev_nr == DF_VECTOR) {
> >> +			/* triple fault -> shutdown */
> >> +			set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests); +		}
> >> +	} else
> >> +		printk(KERN_ERR "Exception 0x%x on 0x%x happens serially\n",
> >> +			prev_nr, nr); +}
> > When two exceptions happens serially is is better to replace pending
> > exception with a new one. This way the first exception (that is lost)
> > will be regenerated when instruction will be re-executed.
> 
> Do you want it to be covered for now? For exception, it is easy but for IRQ, it needs to be pushed back.
> 
Yes I want it to be covered now otherwise any serial exception generates
flood of "Exception happens serially" messages. This function does not
handle IRQ so no problem there.


> Thx, eddie
> 
> 
> 
>     Move Double-Fault generation logic out of page fault
>     exception generating function to cover more generic case.
>     
>     Signed-off-by: Eddie Dong <eddie.dong@intel.com>
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ab1fdac..d0e75a2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -162,12 +162,60 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data)
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_apic_base);
>  
> +#define EXCPT_BENIGN		0
> +#define EXCPT_CONTRIBUTORY	1
> +#define EXCPT_PF		2
> +
> +static int exception_class(int vector)
> +{
> +	if (vector == 14)
> +		return EXCPT_PF;
> +	else if (vector == 0 || (vector >= 10 && vector <= 13))
> +		return EXCPT_CONTRIBUTORY;
> +	else
> +		return EXCPT_BENIGN;
> +}
> +
> +static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
> +		unsigned nr, bool has_error, u32 error_code)
> +{
> +	u32 prev_nr;
> +	int class1, class2;
> +
> +	if (!vcpu->arch.exception.pending) {
> +		vcpu->arch.exception.pending = true;
> +		vcpu->arch.exception.has_error_code = has_error;
> +		vcpu->arch.exception.nr = nr;
> +		vcpu->arch.exception.error_code = error_code;
> +		return;
> +	}
> +
> +	/* to check exception */
> +	prev_nr = vcpu->arch.exception.nr;
> +	if (prev_nr == DF_VECTOR) {
> +		/* triple fault -> shutdown */
> +		set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
> +		return;
> +	}
> +	class1 = exception_class(prev_nr);
> +	class2 = exception_class(nr);
> +	if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY)
> +		|| (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) {
> +		/* generate double fault per SDM Table 5-5 */
> +		printk(KERN_DEBUG "kvm: double fault 0x%x on 0x%x\n",
> +			prev_nr, nr);
> +		vcpu->arch.exception.pending = true;
> +		vcpu->arch.exception.has_error_code = true;
> +		vcpu->arch.exception.nr = DF_VECTOR;
> +		vcpu->arch.exception.error_code = 0;
> +	} else
> +		printk(KERN_ERR "Exception 0x%x on 0x%x happens serially\n",
> +			prev_nr, nr);
> +}
> +
>  void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
>  {
> -	WARN_ON(vcpu->arch.exception.pending);
> -	vcpu->arch.exception.pending = true;
> -	vcpu->arch.exception.has_error_code = false;
> -	vcpu->arch.exception.nr = nr;
> +	kvm_multiple_exception(vcpu, nr, false, 0);
>  }
>  EXPORT_SYMBOL_GPL(kvm_queue_exception);
>  
> @@ -176,18 +224,6 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr,
>  {
>  	++vcpu->stat.pf_guest;
>  
> -	if (vcpu->arch.exception.pending) {
> -		if (vcpu->arch.exception.nr == PF_VECTOR) {
> -			printk(KERN_DEBUG "kvm: inject_page_fault:"
> -					" double fault 0x%lx\n", addr);
> -			vcpu->arch.exception.nr = DF_VECTOR;
> -			vcpu->arch.exception.error_code = 0;
> -		} else if (vcpu->arch.exception.nr == DF_VECTOR) {
> -			/* triple fault -> shutdown */
> -			set_bit(KVM_REQ_TRIPLE_FAULT, &vcpu->requests);
> -		}
> -		return;
> -	}
>  	vcpu->arch.cr2 = addr;
>  	kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
>  }
> @@ -200,11 +236,7 @@ EXPORT_SYMBOL_GPL(kvm_inject_nmi);
>  
>  void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
>  {
> -	WARN_ON(vcpu->arch.exception.pending);
> -	vcpu->arch.exception.pending = true;
> -	vcpu->arch.exception.has_error_code = true;
> -	vcpu->arch.exception.nr = nr;
> -	vcpu->arch.exception.error_code = error_code;
> +	kvm_multiple_exception(vcpu, nr, true, error_code);
>  }
>  EXPORT_SYMBOL_GPL(kvm_queue_exception_e);
>  


--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dong, Eddie May 8, 2009, 10:39 a.m. UTC | #2

ction will be re-executed.
>> 
>> Do you want it to be covered for now? For exception, it is easy but
>> for IRQ, it needs to be pushed back. 
>> 
> Yes I want it to be covered now otherwise any serial exception
> generates flood of "Exception happens serially" messages. This
> function does not handle IRQ so no problem there.

But we soon will let this function cove IRQ as well per SDM.
Why not do that a little bit later?

BTW, this issue exist in original code as well.

Eddie--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dong, Eddie May 8, 2009, 10:46 a.m. UTC | #3

Dong, Eddie wrote:
> ction will be re-executed.
>>> 
>>> Do you want it to be covered for now? For exception, it is easy but
>>> for IRQ, it needs to be pushed back.
>>> 
>> Yes I want it to be covered now otherwise any serial exception
>> generates flood of "Exception happens serially" messages. This
>> function does not handle IRQ so no problem there.
> 
> But we soon will let this function cove IRQ as well per SDM.
> Why not do that a little bit later?
> 
> BTW, this issue exist in original code as well.
> 
> Eddie

Actually this is already addressed in current patch too: Just keep the former exception. If you mean the prink should be removed, I am fine. 
BTW, this case doesn't happen in reality.

Thx, eddie--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov May 8, 2009, 12:16 p.m. UTC | #4

On Fri, May 08, 2009 at 06:39:06PM +0800, Dong, Eddie wrote:
> ction will be re-executed.
> >> 
> >> Do you want it to be covered for now? For exception, it is easy but
> >> for IRQ, it needs to be pushed back. 
> >> 
> > Yes I want it to be covered now otherwise any serial exception
> > generates flood of "Exception happens serially" messages. This
> > function does not handle IRQ so no problem there.
> 
> But we soon will let this function cove IRQ as well per SDM.
> Why not do that a little bit later?
> 
> BTW, this issue exist in original code as well.
> 
Yes, but I've already sent a patch to fix this in existing code. It
your patch will fix this to I'll drop my version.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov May 8, 2009, 12:23 p.m. UTC | #5

On Fri, May 08, 2009 at 06:46:14PM +0800, Dong, Eddie wrote:
> Dong, Eddie wrote:
> > ction will be re-executed.
> >>> 
> >>> Do you want it to be covered for now? For exception, it is easy but
> >>> for IRQ, it needs to be pushed back.
> >>> 
> >> Yes I want it to be covered now otherwise any serial exception
> >> generates flood of "Exception happens serially" messages. This
> >> function does not handle IRQ so no problem there.
> > 
> > But we soon will let this function cove IRQ as well per SDM.
> > Why not do that a little bit later?
> > 
> > BTW, this issue exist in original code as well.
> > 
> > Eddie
> 
> Actually this is already addressed in current patch too: Just keep the former exception. If you mean the prink should be removed, I am fine. 
Keeping the former exception is not the right thing to do. It can't be
delivered because delivering it cause another exception and handler that
may fix the situation is not called since you drop last exception and
keep re-injecting the one that can't be handled.

> BTW, this case doesn't happen in reality.
> 
Then why do you write all this code then? :) I can easily write test
case that will do that (actually I did) and if not handled properly it
just loops taking 100% cpu trying to reinject exception that cannot be
handled.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dong, Eddie May 8, 2009, 3 p.m. UTC | #6

Gleb Natapov wrote:
> On Fri, May 08, 2009 at 06:46:14PM +0800, Dong, Eddie wrote:
>> Dong, Eddie wrote:
>>> ction will be re-executed.
>>>>> 
>>>>> Do you want it to be covered for now? For exception, it is easy
>>>>> but for IRQ, it needs to be pushed back.
>>>>> 
>>>> Yes I want it to be covered now otherwise any serial exception
>>>> generates flood of "Exception happens serially" messages. This
>>>> function does not handle IRQ so no problem there.
>>> 
>>> But we soon will let this function cove IRQ as well per SDM.
>>> Why not do that a little bit later?
>>> 
>>> BTW, this issue exist in original code as well.
>>> 
>>> Eddie
>> 
>> Actually this is already addressed in current patch too: Just keep
>> the former exception. If you mean the prink should be removed, I am
>> fine.  
> Keeping the former exception is not the right thing to do. It can't be
> delivered because delivering it cause another exception and handler
> that may fix the situation is not called since you drop last
> exception and keep re-injecting the one that can't be handled.
> 
>> BTW, this case doesn't happen in reality.
>> 
> Then why do you write all this code then? :) I can easily write test

I am fixing the potential #DF bug existing in current code which only handle
PF on PF.
For those sequential exception, it is WARN_ON in current code.

> case that will do that (actually I did) and if not handled properly it
> just loops taking 100% cpu trying to reinject exception that cannot be
> handled.

Are u sure current code is dead loop in WARN_ON with your test code? 
I don't see it will never happen and thus why printk it, but shouldn't exist
in current guest that KVM can support.

See original kvm_queue_exception in case you ignored the code.

void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
{
        WARN_ON(vcpu->arch.exception.pending);
        vcpu->arch.exception.pending = true;
        vcpu->arch.exception.has_error_code = false;
        vcpu->arch.exception.nr = nr;
}

Any comments from Avi?

Thx, eddie


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov May 8, 2009, 6:44 p.m. UTC | #7

On Fri, May 08, 2009 at 11:00:51PM +0800, Dong, Eddie wrote:
> Gleb Natapov wrote:
> > On Fri, May 08, 2009 at 06:46:14PM +0800, Dong, Eddie wrote:
> >> Dong, Eddie wrote:
> >>> ction will be re-executed.
> >>>>> 
> >>>>> Do you want it to be covered for now? For exception, it is easy
> >>>>> but for IRQ, it needs to be pushed back.
> >>>>> 
> >>>> Yes I want it to be covered now otherwise any serial exception
> >>>> generates flood of "Exception happens serially" messages. This
> >>>> function does not handle IRQ so no problem there.
> >>> 
> >>> But we soon will let this function cove IRQ as well per SDM.
> >>> Why not do that a little bit later?
> >>> 
> >>> BTW, this issue exist in original code as well.
> >>> 
> >>> Eddie
> >> 
> >> Actually this is already addressed in current patch too: Just keep
> >> the former exception. If you mean the prink should be removed, I am
> >> fine.  
> > Keeping the former exception is not the right thing to do. It can't be
> > delivered because delivering it cause another exception and handler
> > that may fix the situation is not called since you drop last
> > exception and keep re-injecting the one that can't be handled.
> > 
> >> BTW, this case doesn't happen in reality.
> >> 
> > Then why do you write all this code then? :) I can easily write test
> 
> I am fixing the potential #DF bug existing in current code which only handle
> PF on PF.
> For those sequential exception, it is WARN_ON in current code.
> 
Can your describe real life scenario that needs this fix? I am all for
fixing code and be as close as possible to SDM, but if you do it do it right.
 
> > case that will do that (actually I did) and if not handled properly it
> > just loops taking 100% cpu trying to reinject exception that cannot be
> > handled.
> 
> Are u sure current code is dead loop in WARN_ON with your test code? 
Yes.

> I don't see it will never happen and thus why printk it, but shouldn't exist
I have the code that triggers this path. Good enough for me.
 
> in current guest that KVM can support.
> 
> See original kvm_queue_exception in case you ignored the code.
> 
There is not point referring to current code. Current code does not
handle serial exceptions properly. So fix it in your patch otherwise I
propose to use my patch that fixes current code
(http://patchwork.kernel.org/patch/21829/).

> void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
> {
>         WARN_ON(vcpu->arch.exception.pending);
>         vcpu->arch.exception.pending = true;
>         vcpu->arch.exception.has_error_code = false;
>         vcpu->arch.exception.nr = nr;
> }
> 
> Any comments from Avi?
> 
> Thx, eddie
> 

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dong, Eddie May 11, 2009, 1:04 a.m. UTC | #8

> There is not point referring to current code. Current code does not
> handle serial exceptions properly. So fix it in your patch otherwise I
> propose to use my patch that fixes current code
> (http://patchwork.kernel.org/patch/21829/).
> 

I would like Avi to decide. As comments to the difference of 2 patches, my undrestanding is that I am addressing the problem base on SDM 5-4 with the answer to serial injection as first in first service. Your patch doesn;t solve generic double fault case for example exception 11 on 11, or GP on GP which needs to be converted to #DF per SDM, rather you only handle the case the secondary exception is PF,  and servicing PF.

I can check with internal architecture to see what does "handle exceptions serially" mean in really. For me serial means first in first out, and thus we should remain 1st exception.


Eddie.--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov May 11, 2009, 6:02 a.m. UTC | #9

On Mon, May 11, 2009 at 09:04:52AM +0800, Dong, Eddie wrote:
> 
> > There is not point referring to current code. Current code does not
> > handle serial exceptions properly. So fix it in your patch otherwise I
> > propose to use my patch that fixes current code
> > (http://patchwork.kernel.org/patch/21829/).
> > 
> 
> I would like Avi to decide. As comments to the difference of 2 patches, my undrestanding is that I am addressing the problem base on SDM 5-4 with the answer to serial injection as first in first service. Your patch doesn;t solve generic double fault case for example exception 11 on 11, or GP on GP which needs to be converted to #DF per SDM, rather you only handle the case the secondary exception is PF,  and servicing PF.
> 
There is nothing to decide really. I prefer your patch with serial
exception handling fixed. If you'll not do it I'll do it.

> I can check with internal architecture to see what does "handle exceptions serially" mean in really. For me serial means first in first out, and thus we should remain 1st exception.
> 
There is a table 5.2 that defines an order between some events.  The table
is not complete, I don't see #DE there for instance.  But consider
this case: #DE (or #NP) happens while exception stack is paged out so
#PF happens next. #PF is handled by TSS gate so it uses its own stack
and it fixes exception stack in its handler. If we drop #PF because #DE
is already waiting we will keep trying to inject #DE indefinitely. The
result is hanging QEMU process eating 100% cpu time. If we replace #DE
with #PF on the other hand then #PF handler will fix exception stack
instruction that caused #DE will be re-executed, #DE regenerated and
handled properly. So which scenario do you prefer?

WFIW bochs/qemu replace old exception with a new one.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Avi Kivity May 11, 2009, 6:17 a.m. UTC | #10

Dong, Eddie wrote:
>> There is not point referring to current code. Current code does not
>> handle serial exceptions properly. So fix it in your patch otherwise I
>> propose to use my patch that fixes current code
>> (http://patchwork.kernel.org/patch/21829/).
>>
>>     
>
> I would like Avi to decide. 

I would prefer you two to reach agreement.  Less work for me.

> I can check with internal architecture to see what does "handle exceptions serially" mean in really. For me serial means first in first out, and thus we should remain 1st exception.
>   

The second exception was encountered while injecting the first 
exception, so how can you continue with the first without servicing the 
second?

Dong, Eddie May 12, 2009, 5:35 a.m. UTC | #11

Gleb Natapov wrote:
> On Mon, May 11, 2009 at 09:04:52AM +0800, Dong, Eddie wrote:
>> 
>>> There is not point referring to current code. Current code does not
>>> handle serial exceptions properly. So fix it in your patch
>>> otherwise I propose to use my patch that fixes current code
>>> (http://patchwork.kernel.org/patch/21829/).
>>> 
>> 
>> I would like Avi to decide. As comments to the difference of 2
>> patches, my undrestanding is that I am addressing the problem base
>> on SDM 5-4 with the answer to serial injection as first in first
>> service. Your patch doesn;t solve generic double fault case for
>> example exception 11 on 11, or GP on GP which needs to be converted
>> to #DF per SDM, rather you only handle the case the secondary
>> exception is PF,  and servicing PF.      
>> 
> There is nothing to decide really. I prefer your patch with serial
> exception handling fixed. If you'll not do it I'll do it.

OK, an additional patch will be constructive but my position is neutral. The reason (mentioned) is:

1: Current KVM just WARN_ON for those case (and never be hit), so the this patch won't introduce 
additional issues. Either printk or WARN_ON to notify us in case we met the problem in future is safer way for me.

2: In case of real "serial ecception" happens, from architectural point of view, I think we'd better consult Table 5-2 to prioritize them, which is neither reserving former exception nor overwritting. But as you mentioned, the list is not completed. My point is that this is another complicated scenario that we should spend time in future, but not related to current patch.

3: This function will soon needs to be extended to cover IRQ case too, which needs to push back the overwritten IRQ. We need a total solution for this, so I prefer to do that some time later.

4: I prefer to split issue if possible. 

> 
>> I can check with internal architecture to see what does "handle
>> exceptions serially" mean in really. For me serial means first in
>> first out, and thus we should remain 1st exception.  
>> 
> There is a table 5.2 that defines an order between some events.  The
> table is not complete, I don't see #DE there for instance.  But
> consider 
> this case: #DE (or #NP) happens while exception stack is paged out so
> #PF happens next. #PF is handled by TSS gate so it uses its own stack
> and it fixes exception stack in its handler. If we drop #PF because
> #DE is already waiting we will keep trying to inject #DE
> indefinitely. The result is hanging QEMU process eating 100% cpu
> time. If we replace #DE with #PF on the other hand then #PF handler
> will fix exception stack instruction that caused #DE will be
> re-executed, #DE regenerated and handled properly. So which scenario
> do you prefer? 

See above.

Thx, eddie--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov May 12, 2009, 7:01 a.m. UTC | #12

On Tue, May 12, 2009 at 01:35:31PM +0800, Dong, Eddie wrote:
> Gleb Natapov wrote:
> > On Mon, May 11, 2009 at 09:04:52AM +0800, Dong, Eddie wrote:
> >> 
> >>> There is not point referring to current code. Current code does not
> >>> handle serial exceptions properly. So fix it in your patch
> >>> otherwise I propose to use my patch that fixes current code
> >>> (http://patchwork.kernel.org/patch/21829/).
> >>> 
> >> 
> >> I would like Avi to decide. As comments to the difference of 2
> >> patches, my undrestanding is that I am addressing the problem base
> >> on SDM 5-4 with the answer to serial injection as first in first
> >> service. Your patch doesn;t solve generic double fault case for
> >> example exception 11 on 11, or GP on GP which needs to be converted
> >> to #DF per SDM, rather you only handle the case the secondary
> >> exception is PF,  and servicing PF.      
> >> 
> > There is nothing to decide really. I prefer your patch with serial
> > exception handling fixed. If you'll not do it I'll do it.
> 
> OK, an additional patch will be constructive but my position is neutral. The reason (mentioned) is:
> 
> 1: Current KVM just WARN_ON for those case (and never be hit), so the this patch won't introduce 
> additional issues. Either printk or WARN_ON to notify us in case we met the problem in future is safer way for me.
> 
But current KVM also replace pending exception with a newer one after
WARN_ON. I agree that real OSes (at least common ones) never hit this
case. But it is possible to hit it from a guest and I have a test case.
 
> 2: In case of real "serial ecception" happens, from architectural point of view, I think we'd better consult Table 5-2 to prioritize them, which is neither reserving former exception nor overwritting. But as you mentioned, the list is not completed. My point is that this is another complicated scenario that we should spend time in future, but not related to current patch.
> 
If you can get more complete info about what real CPU does in case of
simultaneous exceptions it would be nice. I think CPU is smart enough
to understand when second exception happened while trying to handle the
first one and handle the second one first in this case. Otherwise I
don't see how it could work.

> 3: This function will soon needs to be extended to cover IRQ case too, which needs to push back the overwritten IRQ. We need a total solution for this, so I prefer to do that some time later.
> 
I don't think that IRQ should be handled by this function. At leas it
should still be stored in its own queue.

> 4: I prefer to split issue if possible. 
> 
> 
That is OK, You can send two patches. The first one will WARN_ON and
overwrite exception like the current code does. And the second one will
remove WARN_ON explaining that this case is actually possible to trigger
from a guest.

> > 
> >> I can check with internal architecture to see what does "handle
> >> exceptions serially" mean in really. For me serial means first in
> >> first out, and thus we should remain 1st exception.  
> >> 
> > There is a table 5.2 that defines an order between some events.  The
> > table is not complete, I don't see #DE there for instance.  But
> > consider 
> > this case: #DE (or #NP) happens while exception stack is paged out so
> > #PF happens next. #PF is handled by TSS gate so it uses its own stack
> > and it fixes exception stack in its handler. If we drop #PF because
> > #DE is already waiting we will keep trying to inject #DE
> > indefinitely. The result is hanging QEMU process eating 100% cpu
> > time. If we replace #DE with #PF on the other hand then #PF handler
> > will fix exception stack instruction that caused #DE will be
> > re-executed, #DE regenerated and handled properly. So which scenario
> > do you prefer? 
> 
> See above.
> 
> Thx, eddie
--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dong, Eddie May 12, 2009, 7:38 a.m. UTC | #13

I noticed the MACRO for SVM vmcb->control.event_inj and VMX VM_EXIT_INTR_INFO are almost same, I have a need to query the event injection situation in common code so plan to expose this register read/write to x86.c.  Should we define a new format for evtinj/VM_EXIT_INTR_INFO as common KVM format, or just move those original MACRO to kvm_host.h?

Thx, eddie--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov May 12, 2009, 8:49 a.m. UTC | #14

On Tue, May 12, 2009 at 03:38:59PM +0800, Dong, Eddie wrote:
> I noticed the MACRO for SVM vmcb->control.event_inj and VMX VM_EXIT_INTR_INFO are almost same, I have a need to query the event injection situation in common code so plan to expose this register read/write to x86.c.  Should we define a new format for evtinj/VM_EXIT_INTR_INFO as common KVM format, or just move those original MACRO to kvm_host.h?
> 
I haven't seen your code, so I don't know what you are trying to do, but why
querying interrupt/nmi/exception injection queues is not enough? What
info is missing there?

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Avi Kivity May 13, 2009, 9:49 a.m. UTC | #15

Dong, Eddie wrote:
> I noticed the MACRO for SVM vmcb->control.event_inj and VMX VM_EXIT_INTR_INFO are almost same, I have a need to query the event injection situation in common code so plan to expose this register read/write to x86.c.  Should we define a new format for evtinj/VM_EXIT_INTR_INFO as common KVM format, or just move those original MACRO to kvm_host.h?
>
>   

This is dangerous if additional bits or field values are defined by 
either architecture.  Better to use accessors.

Dong, Eddie May 13, 2009, 2:20 p.m. UTC | #16

Avi Kivity wrote:
> Dong, Eddie wrote:
>> I noticed the MACRO for SVM vmcb->control.event_inj and VMX
>> VM_EXIT_INTR_INFO are almost same, I have a need to query the event
>> injection situation in common code so plan to expose this register
>> read/write to x86.c.  Should we define a new format for
>> evtinj/VM_EXIT_INTR_INFO as common KVM format, or just move those
>> original MACRO to kvm_host.h?     
>> 
>> 
> 
> This is dangerous if additional bits or field values are defined by
> either architecture.  Better to use accessors.

OK.
Also back to Gleb's question, the reason I want to do that is to simplify event
generation mechanism in current KVM.

Today KVM use additional layer of exception/nmi/interrupt such as
vcpu.arch.exception.pending, vcpu->arch.interrupt.pending & vcpu->arch.nmi_injected.
All those additional layer is due to compete of VM_ENTRY_INTR_INFO_FIELD
write to inject the event. Both SVM & VMX has only one resource to inject the virtual event
but KVM generates 3 catagory of events in parallel which further requires additional
logic to dictate among them. One example is that exception has higher priority
than NMI/IRQ injection in current code which is not true in reality. 

Another issue is that an failed event from previous injection say IRQ or NMI may be 
discarded if an virtual exception happens in the EXIT handling now. With the patch of 
generic double fault handling, this case should be handled as normally.

Will post RFC soon.

Thx, eddie--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Avi Kivity May 14, 2009, 9:27 a.m. UTC | #17

Dong, Eddie wrote:
> OK.
> Also back to Gleb's question, the reason I want to do that is to simplify event
> generation mechanism in current KVM.
>
> Today KVM use additional layer of exception/nmi/interrupt such as
> vcpu.arch.exception.pending, vcpu->arch.interrupt.pending & vcpu->arch.nmi_injected.
> All those additional layer is due to compete of VM_ENTRY_INTR_INFO_FIELD
> write to inject the event. Both SVM & VMX has only one resource to inject the virtual event
> but KVM generates 3 catagory of events in parallel which further requires additional
> logic to dictate among them. 

I thought of using a queue to hold all pending events (in a common 
format), sort it by priority, and inject the head.

> One example is that exception has higher priority
> than NMI/IRQ injection in current code which is not true in reality. 
>   

I don't think it matters in practice, since the guest will see it as a 
timing issue.  NMIs and IRQs are asynchronous (even those generated by 
the guest through the local APIC).

> Another issue is that an failed event from previous injection say IRQ or NMI may be 
> discarded if an virtual exception happens in the EXIT handling now. With the patch of 
> generic double fault handling, this case should be handled as normally.
>   

Discarding an exception is usually okay as it will be regenerated.  I 
don't think we discard interrupts or NMIs.

Implement generic double fault generation mechanism

Commit Message

Comments

Patch