diff mbox

[05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs

Message ID 1433783045-8002-6-git-send-email-marc.zyngier@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marc Zyngier June 8, 2015, 5:04 p.m. UTC
We only set the irq_queued flag for level interrupts, meaning
that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
for all interrupts.

This will allow us to inject edge HW interrupts, for which the
state ACTIVE+PENDING is not allowed.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christoffer Dall June 30, 2015, 8:19 p.m. UTC | #1
On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
> We only set the irq_queued flag for level interrupts, meaning
> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
> for all interrupts.
> 
> This will allow us to inject edge HW interrupts, for which the
> state ACTIVE+PENDING is not allowed.

I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
Do you mean that if we set the HW bit in the LR, then we are linking to
an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
GIC side?

Why is this relevant here?  I feel like I'm missing context.

Thanks,
-Christoffer

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/vgic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 78fb820..59ed7a3 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -377,7 +377,7 @@ void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
>  
>  static bool vgic_can_sample_irq(struct kvm_vcpu *vcpu, int irq)
>  {
> -	return vgic_irq_is_edge(vcpu, irq) || !vgic_irq_is_queued(vcpu, irq);
> +	return !vgic_irq_is_queued(vcpu, irq);
>  }
>  
>  /**
> -- 
> 2.1.4
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier July 1, 2015, 9:17 a.m. UTC | #2
On 30/06/15 21:19, Christoffer Dall wrote:
> On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
>> We only set the irq_queued flag for level interrupts, meaning
>> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
>> for all interrupts.
>>
>> This will allow us to inject edge HW interrupts, for which the
>> state ACTIVE+PENDING is not allowed.
> 
> I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
> Do you mean that if we set the HW bit in the LR, then we are linking to
> an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
> GIC side?
> 
> Why is this relevant here?  I feel like I'm missing context.

I've probably taken a shortcut here - bear with me while I'm trying to
explain the issue.

For HW interrupts, we shouldn't even try to use the state bits in the
LR, because that state is contained in the physical distributor. Setting
the HW bit really means "there is something going on at the distributor
level, just go there".

If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
basically loose the second interrupt because that state is simply not
considered.

So the trick we're using is to only inject the active interrupt, and
prevent anything else from being injected until we can confirm that the
active state has been cleared at the physical level.

Does it make any sense?

	M.
Christoffer Dall July 1, 2015, 11:58 a.m. UTC | #3
On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
> On 30/06/15 21:19, Christoffer Dall wrote:
> > On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
> >> We only set the irq_queued flag for level interrupts, meaning
> >> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
> >> for all interrupts.
> >>
> >> This will allow us to inject edge HW interrupts, for which the
> >> state ACTIVE+PENDING is not allowed.
> > 
> > I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
> > Do you mean that if we set the HW bit in the LR, then we are linking to
> > an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
> > GIC side?
> > 
> > Why is this relevant here?  I feel like I'm missing context.
> 
> I've probably taken a shortcut here - bear with me while I'm trying to
> explain the issue.
> 
> For HW interrupts, we shouldn't even try to use the state bits in the
> LR, because that state is contained in the physical distributor. Setting
> the HW bit really means "there is something going on at the distributor
> level, just go there".

ok, so by "HW interrupts" you mean virtual interrupts with the HW bit in
the LR set, correct?

> 
> If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
> basically loose the second interrupt because that state is simply not
> considered.

Huh?  Which second interrupt.  I looked at the spec and it says don't
use the state bits for HW interrupts, so isn't it simply not supported
to set these bits at all and that's it?

> 
> So the trick we're using is to only inject the active interrupt, and
> prevent anything else from being injected until we can confirm that the
> active state has been cleared at the physical level.
> 
> Does it make any sense?
> 
Sort of, but what I don't understand now is how the guest ever sees the
interrupt then.  If we always inject the virtual interrupt by setting
the active state on the physical distributor, and we can't inject this
as active+pending, and the guest doesn't see the state in the LR, then
how does this ever raise a virtual interrupt and how does the guest see
an interrupt which is only PENDING so that it can ack it etc. etc.?

Maybe I don't fully understand how the HW bit works after all...

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier July 1, 2015, 6:18 p.m. UTC | #4
On 01/07/15 12:58, Christoffer Dall wrote:
> On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
>> On 30/06/15 21:19, Christoffer Dall wrote:
>>> On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
>>>> We only set the irq_queued flag for level interrupts, meaning
>>>> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
>>>> for all interrupts.
>>>>
>>>> This will allow us to inject edge HW interrupts, for which the
>>>> state ACTIVE+PENDING is not allowed.
>>>
>>> I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
>>> Do you mean that if we set the HW bit in the LR, then we are linking to
>>> an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
>>> GIC side?
>>>
>>> Why is this relevant here?  I feel like I'm missing context.
>>
>> I've probably taken a shortcut here - bear with me while I'm trying to
>> explain the issue.
>>
>> For HW interrupts, we shouldn't even try to use the state bits in the
>> LR, because that state is contained in the physical distributor. Setting
>> the HW bit really means "there is something going on at the distributor
>> level, just go there".
> 
> ok, so by "HW interrupts" you mean virtual interrupts with the HW bit in
> the LR set, correct?

Yes, sorry.

>>
>> If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
>> basically loose the second interrupt because that state is simply not
>> considered.
> 
> Huh?  Which second interrupt.  I looked at the spec and it says don't
> use the state bits for HW interrupts, so isn't it simply not supported
> to set these bits at all and that's it?

I managed to confuse myself reading the same bit. It says (GICv3 spec):

"A hypervisor must only use the pending and active state for software
originated interrupts, which are typically associated with virtual
devices, or SGIs."

That's the PENDING+ACTIVE state, and not the pending and active bits
like I read it initially.

Now consider the following scenario:

- We inject a virtual edge interrupt
- We mark the corresponding physical interrupt as active.
- Queue interrupt in an LR
- Resume vcpu

Now, we inject another edge interrupt, the vcpu exits for whatever
reason, and the previously injected interrupt is still active.

The normal vGIC flow would be to mark the interrupt as ACTIVE+PENDING in
the LR, and resume the vcpu. But the above states that this is invalid
for HW generated interrupts.

>>
>> So the trick we're using is to only inject the active interrupt, and
>> prevent anything else from being injected until we can confirm that the
>> active state has been cleared at the physical level.
>>
>> Does it make any sense?
>>
> Sort of, but what I don't understand now is how the guest ever sees the
> interrupt then.  If we always inject the virtual interrupt by setting
> the active state on the physical distributor, and we can't inject this
> as active+pending, and the guest doesn't see the state in the LR, then
> how does this ever raise a virtual interrupt and how does the guest see
> an interrupt which is only PENDING so that it can ack it etc. etc.?
> 
> Maybe I don't fully understand how the HW bit works after all...

The way the spec is written is slightly misleading. But the gist of it
is that we still signal the guest using the PENDING bit in the LR, and
switch the LR as usual. it is just that we can't use the PENDING+ACTIVE
state (apparently, this can lead to a double deactivation).

Not sure the above makes sense. Beer time, I suppose.

	M.
Christoffer Dall July 2, 2015, 4:23 p.m. UTC | #5
On Wed, Jul 01, 2015 at 07:18:40PM +0100, Marc Zyngier wrote:
> On 01/07/15 12:58, Christoffer Dall wrote:
> > On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
> >> On 30/06/15 21:19, Christoffer Dall wrote:
> >>> On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
> >>>> We only set the irq_queued flag for level interrupts, meaning
> >>>> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
> >>>> for all interrupts.
> >>>>
> >>>> This will allow us to inject edge HW interrupts, for which the
> >>>> state ACTIVE+PENDING is not allowed.
> >>>
> >>> I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
> >>> Do you mean that if we set the HW bit in the LR, then we are linking to
> >>> an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
> >>> GIC side?
> >>>
> >>> Why is this relevant here?  I feel like I'm missing context.
> >>
> >> I've probably taken a shortcut here - bear with me while I'm trying to
> >> explain the issue.
> >>
> >> For HW interrupts, we shouldn't even try to use the state bits in the
> >> LR, because that state is contained in the physical distributor. Setting
> >> the HW bit really means "there is something going on at the distributor
> >> level, just go there".
> > 
> > ok, so by "HW interrupts" you mean virtual interrupts with the HW bit in
> > the LR set, correct?
> 
> Yes, sorry.
> 
> >>
> >> If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
> >> basically loose the second interrupt because that state is simply not
> >> considered.
> > 
> > Huh?  Which second interrupt.  I looked at the spec and it says don't
> > use the state bits for HW interrupts, so isn't it simply not supported
> > to set these bits at all and that's it?
> 
> I managed to confuse myself reading the same bit. It says (GICv3 spec):
> 
> "A hypervisor must only use the pending and active state for software
> originated interrupts, which are typically associated with virtual
> devices, or SGIs."
> 
> That's the PENDING+ACTIVE state, and not the pending and active bits
> like I read it initially.
> 
> Now consider the following scenario:
> 
> - We inject a virtual edge interrupt
> - We mark the corresponding physical interrupt as active.
> - Queue interrupt in an LR
> - Resume vcpu
> 
> Now, we inject another edge interrupt, the vcpu exits for whatever
> reason, and the previously injected interrupt is still active.
> 
> The normal vGIC flow would be to mark the interrupt as ACTIVE+PENDING in
> the LR, and resume the vcpu. But the above states that this is invalid
> for HW generated interrupts.

Right, ok, so we must resample the pending state even for an
edge-triggered interrupt once it's EOIed, because we cannot put it in
the LR despite it being pending on the physical distributor?

Incidentally, we do not need to set the EOI_INT bit, becuase when the
guest EOIs the interrupt, it will also deactivate it on the physical
distributor and the hardware will then take the pending physical
interrupt, we will handle it in the host, etc. etc.

If we had a different *shared* device than the timer which is
edge-triggered, don't we then also need to capture the physical
distributor's pending state along with the state of the device unless we
assume that upon restoring the state for the device count on the device
to have another rising/falling edge to trigger the interrupt again? (I
assume the line would always go high for a level-triggered interrupt in
this case).

> 
> >>
> >> So the trick we're using is to only inject the active interrupt, and
> >> prevent anything else from being injected until we can confirm that the
> >> active state has been cleared at the physical level.
> >>
> >> Does it make any sense?
> >>
> > Sort of, but what I don't understand now is how the guest ever sees the
> > interrupt then.  If we always inject the virtual interrupt by setting
> > the active state on the physical distributor, and we can't inject this
> > as active+pending, and the guest doesn't see the state in the LR, then
> > how does this ever raise a virtual interrupt and how does the guest see
> > an interrupt which is only PENDING so that it can ack it etc. etc.?
> > 
> > Maybe I don't fully understand how the HW bit works after all...
> 
> The way the spec is written is slightly misleading. But the gist of it
> is that we still signal the guest using the PENDING bit in the LR, and
> switch the LR as usual. it is just that we can't use the PENDING+ACTIVE
> state (apparently, this can lead to a double deactivation).
> 
> Not sure the above makes sense. Beer time, I suppose.
> 
It does make sense, I just had to sleep on it and see the code as a
whole instead of trying to understand it by just looking at this patch
individually.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier July 3, 2015, 9:50 a.m. UTC | #6
On 02/07/15 17:23, Christoffer Dall wrote:
> On Wed, Jul 01, 2015 at 07:18:40PM +0100, Marc Zyngier wrote:
>> On 01/07/15 12:58, Christoffer Dall wrote:
>>> On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
>>>> On 30/06/15 21:19, Christoffer Dall wrote:
>>>>> On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
>>>>>> We only set the irq_queued flag for level interrupts, meaning
>>>>>> that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate
>>>>>> for all interrupts.
>>>>>>
>>>>>> This will allow us to inject edge HW interrupts, for which the
>>>>>> state ACTIVE+PENDING is not allowed.
>>>>>
>>>>> I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
>>>>> Do you mean that if we set the HW bit in the LR, then we are linking to
>>>>> an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
>>>>> GIC side?
>>>>>
>>>>> Why is this relevant here?  I feel like I'm missing context.
>>>>
>>>> I've probably taken a shortcut here - bear with me while I'm trying to
>>>> explain the issue.
>>>>
>>>> For HW interrupts, we shouldn't even try to use the state bits in the
>>>> LR, because that state is contained in the physical distributor. Setting
>>>> the HW bit really means "there is something going on at the distributor
>>>> level, just go there".
>>>
>>> ok, so by "HW interrupts" you mean virtual interrupts with the HW bit in
>>> the LR set, correct?
>>
>> Yes, sorry.
>>
>>>>
>>>> If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
>>>> basically loose the second interrupt because that state is simply not
>>>> considered.
>>>
>>> Huh?  Which second interrupt.  I looked at the spec and it says don't
>>> use the state bits for HW interrupts, so isn't it simply not supported
>>> to set these bits at all and that's it?
>>
>> I managed to confuse myself reading the same bit. It says (GICv3 spec):
>>
>> "A hypervisor must only use the pending and active state for software
>> originated interrupts, which are typically associated with virtual
>> devices, or SGIs."
>>
>> That's the PENDING+ACTIVE state, and not the pending and active bits
>> like I read it initially.
>>
>> Now consider the following scenario:
>>
>> - We inject a virtual edge interrupt
>> - We mark the corresponding physical interrupt as active.
>> - Queue interrupt in an LR
>> - Resume vcpu
>>
>> Now, we inject another edge interrupt, the vcpu exits for whatever
>> reason, and the previously injected interrupt is still active.
>>
>> The normal vGIC flow would be to mark the interrupt as ACTIVE+PENDING in
>> the LR, and resume the vcpu. But the above states that this is invalid
>> for HW generated interrupts.
> 
> Right, ok, so we must resample the pending state even for an
> edge-triggered interrupt once it's EOIed, because we cannot put it in
> the LR despite it being pending on the physical distributor?
> 
> Incidentally, we do not need to set the EOI_INT bit, becuase when the
> guest EOIs the interrupt, it will also deactivate it on the physical
> distributor and the hardware will then take the pending physical
> interrupt, we will handle it in the host, etc. etc.
> 
> If we had a different *shared* device than the timer which is
> edge-triggered, don't we then also need to capture the physical
> distributor's pending state along with the state of the device unless we
> assume that upon restoring the state for the device count on the device
> to have another rising/falling edge to trigger the interrupt again? (I
> assume the line would always go high for a level-triggered interrupt in
> this case).

I'd definitely assume that restoring the state of the device would make
it generate an interrupt. This has to be a property of the device,
otherwise it is not really shareable between vcpus.

Time will tell - we still have to see one of these.

>>
>>>>
>>>> So the trick we're using is to only inject the active interrupt, and
>>>> prevent anything else from being injected until we can confirm that the
>>>> active state has been cleared at the physical level.
>>>>
>>>> Does it make any sense?
>>>>
>>> Sort of, but what I don't understand now is how the guest ever sees the
>>> interrupt then.  If we always inject the virtual interrupt by setting
>>> the active state on the physical distributor, and we can't inject this
>>> as active+pending, and the guest doesn't see the state in the LR, then
>>> how does this ever raise a virtual interrupt and how does the guest see
>>> an interrupt which is only PENDING so that it can ack it etc. etc.?
>>>
>>> Maybe I don't fully understand how the HW bit works after all...
>>
>> The way the spec is written is slightly misleading. But the gist of it
>> is that we still signal the guest using the PENDING bit in the LR, and
>> switch the LR as usual. it is just that we can't use the PENDING+ACTIVE
>> state (apparently, this can lead to a double deactivation).
>>
>> Not sure the above makes sense. Beer time, I suppose.
>>
> It does make sense, I just had to sleep on it and see the code as a
> whole instead of trying to understand it by just looking at this patch
> individually.

Thanks,

	M.
Peter Maydell July 3, 2015, 9:57 a.m. UTC | #7
On 3 July 2015 at 10:50, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 02/07/15 17:23, Christoffer Dall wrote:
>> If we had a different *shared* device than the timer which is
>> edge-triggered, don't we then also need to capture the physical
>> distributor's pending state along with the state of the device unless we
>> assume that upon restoring the state for the device count on the device
>> to have another rising/falling edge to trigger the interrupt again? (I
>> assume the line would always go high for a level-triggered interrupt in
>> this case).
>
> I'd definitely assume that restoring the state of the device would make
> it generate an interrupt. This has to be a property of the device,
> otherwise it is not really shareable between vcpus.

FWIW, QEMU's modelling approach to this is to say that devices
do *not* generate interrupts on restore. If the device had
previously generated an interrupt then this should be captured
by the state of the interrupt controller (or whatever else
it is connected to) and dealt with when the GIC state is restored.

If you say that restoring the device state is supposed to
generate an interrupt, you introduce an ordering requirement
that the state of the interrupt controller is restored
first and the device second (otherwise the incoming GIC
state will overwrite the interrupt that the device just
generated), which isn't ideal (especially since QEMU
makes no guarantees about restore order between devices).

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 78fb820..59ed7a3 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -377,7 +377,7 @@  void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
 
 static bool vgic_can_sample_irq(struct kvm_vcpu *vcpu, int irq)
 {
-	return vgic_irq_is_edge(vcpu, irq) || !vgic_irq_is_queued(vcpu, irq);
+	return !vgic_irq_is_queued(vcpu, irq);
 }
 
 /**