diff mbox

kvm device assignment and MSI-X masking

Message ID 502A5C63.70803@siemens.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kiszka Aug. 14, 2012, 2:10 p.m. UTC
On 2012-08-14 16:05, Alex Williamson wrote:
> On Tue, 2012-08-14 at 15:48 +0200, Jan Kiszka wrote:
>> Hi Alex,
>>
>> you once wrote this comment in device-assignment.c, msix_mmio_write:
>>
>>     if (!msix_masked(&orig) && msix_masked(entry)) {
>>         /*
>>          * Vector masked, disable it
>>          *
>>          * XXX It's not clear if we can or should actually attempt
>>          * to mask or disable the interrupt.  KVM doesn't have
>>          * support for pending bits and kvm_assign_set_msix_entry
>>          * doesn't modify the device hardware mask.  Interrupts
>>          * while masked are simply not injected to the guest, so
>>          * are lost.  Can we get away with always injecting an
>>          * interrupt on unmask?
>>          */
>>
>> I'm wondering what made you think that we won't inject if the vector is
>> masked like this (ie. in the shadow MSI-X table). Can you recall the
>> details?
>>
>> I'm trying to refactor this code to make the KVM interface a bit more
>> encapsulating the kernel interface details, not fixing anything. Still,
>> I would also like to avoid introducing regressions.
> 
> Yeah, I didn't leave a very good comment there.  I'm sure it made more
> sense to me at the time.  I think I was trying to say that not only do
> we not have a way to mask the physical hardware, but if we did, we don't
> have a way to retrieve the pending bits, so any pending interrupts while
> masked would be lost.  We might be able to deal with that by posting a
> spurious interrupt on unmask, but for now we do nothing as masking is
> usually done just to update the vector.  Thanks,

Ok, thanks for the clarification.

As we are at it, do you also recall if this


would make a relevant difference for known workloads? I'm trying to get
rid of direct routing table manipulations, but I would also like to
avoid introducing things like kvm_irqchip_update_msi_route unless really
necessary. Or could VFIO make use of that as well?

Jan

PS: Will try to have a look at your main VFIO patch later today.

Comments

Alex Williamson Aug. 14, 2012, 2:31 p.m. UTC | #1
On Tue, 2012-08-14 at 16:10 +0200, Jan Kiszka wrote:
> On 2012-08-14 16:05, Alex Williamson wrote:
> > On Tue, 2012-08-14 at 15:48 +0200, Jan Kiszka wrote:
> >> Hi Alex,
> >>
> >> you once wrote this comment in device-assignment.c, msix_mmio_write:
> >>
> >>     if (!msix_masked(&orig) && msix_masked(entry)) {
> >>         /*
> >>          * Vector masked, disable it
> >>          *
> >>          * XXX It's not clear if we can or should actually attempt
> >>          * to mask or disable the interrupt.  KVM doesn't have
> >>          * support for pending bits and kvm_assign_set_msix_entry
> >>          * doesn't modify the device hardware mask.  Interrupts
> >>          * while masked are simply not injected to the guest, so
> >>          * are lost.  Can we get away with always injecting an
> >>          * interrupt on unmask?
> >>          */
> >>
> >> I'm wondering what made you think that we won't inject if the vector is
> >> masked like this (ie. in the shadow MSI-X table). Can you recall the
> >> details?
> >>
> >> I'm trying to refactor this code to make the KVM interface a bit more
> >> encapsulating the kernel interface details, not fixing anything. Still,
> >> I would also like to avoid introducing regressions.
> > 
> > Yeah, I didn't leave a very good comment there.  I'm sure it made more
> > sense to me at the time.  I think I was trying to say that not only do
> > we not have a way to mask the physical hardware, but if we did, we don't
> > have a way to retrieve the pending bits, so any pending interrupts while
> > masked would be lost.  We might be able to deal with that by posting a
> > spurious interrupt on unmask, but for now we do nothing as masking is
> > usually done just to update the vector.  Thanks,
> 
> Ok, thanks for the clarification.
> 
> As we are at it, do you also recall if this
> 
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -1573,28 +1573,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>               */
>          } else if (msix_masked(&orig) && !msix_masked(entry)) {
>              /* Vector unmasked */
> -            if (i >= adev->irq_entries_nr || !adev->entry[i].type) {
> -                /* Previously unassigned vector, start from scratch */
> -                assigned_dev_update_msix(pdev);
> -                return;
> -            } else {
> -                /* Update an existing, previously masked vector */
> -                struct kvm_irq_routing_entry orig = adev->entry[i];
> -                int ret;
> -
> -                adev->entry[i].u.msi.address_lo = entry->addr_lo;
> -                adev->entry[i].u.msi.address_hi = entry->addr_hi;
> -                adev->entry[i].u.msi.data = entry->data;
> -
> -                ret = kvm_update_routing_entry(&orig, &adev->entry[i]);
> -                if (ret) {
> -                    fprintf(stderr,
> -                            "Error updating irq routing entry (%d)\n", ret);
> -                    return;
> -                }
> -
> -                kvm_irqchip_commit_routes(kvm_state);
> -            }
> +            assigned_dev_update_msix(pdev);
>          }
>      }
>  }
> 
> would make a relevant difference for known workloads? I'm trying to get
> rid of direct routing table manipulations, but I would also like to
> avoid introducing things like kvm_irqchip_update_msi_route unless really
> necessary. Or could VFIO make use of that as well?

It makes me a little nervous, but I don't know that it won't work.
There's a lot more latency in turning off MSI-X and completely
rebuilding it than there is in updating the routing of a single vector.
You can imagine that irqbalance could be triggering this path pretty
regularly.  Increasing vectors beyond what was previously setup is more
of an init-time event, so the latency doesn't bother me as much.  We'd
probably have to send some spurious interrupts for anything we might
have missed if we take the high latency path.

VFIO is already a little more abstracted, making use of the msix vector
use and release interface, but we do still make use of the kvm_irqchip
irqfd/virq interfaces.

> PS: Will try to have a look at your main VFIO patch later today.

Thanks :)

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka Aug. 14, 2012, 3:15 p.m. UTC | #2
On 2012-08-14 16:31, Alex Williamson wrote:
> On Tue, 2012-08-14 at 16:10 +0200, Jan Kiszka wrote:
>> On 2012-08-14 16:05, Alex Williamson wrote:
>>> On Tue, 2012-08-14 at 15:48 +0200, Jan Kiszka wrote:
>>>> Hi Alex,
>>>>
>>>> you once wrote this comment in device-assignment.c, msix_mmio_write:
>>>>
>>>>     if (!msix_masked(&orig) && msix_masked(entry)) {
>>>>         /*
>>>>          * Vector masked, disable it
>>>>          *
>>>>          * XXX It's not clear if we can or should actually attempt
>>>>          * to mask or disable the interrupt.  KVM doesn't have
>>>>          * support for pending bits and kvm_assign_set_msix_entry
>>>>          * doesn't modify the device hardware mask.  Interrupts
>>>>          * while masked are simply not injected to the guest, so
>>>>          * are lost.  Can we get away with always injecting an
>>>>          * interrupt on unmask?
>>>>          */
>>>>
>>>> I'm wondering what made you think that we won't inject if the vector is
>>>> masked like this (ie. in the shadow MSI-X table). Can you recall the
>>>> details?
>>>>
>>>> I'm trying to refactor this code to make the KVM interface a bit more
>>>> encapsulating the kernel interface details, not fixing anything. Still,
>>>> I would also like to avoid introducing regressions.
>>>
>>> Yeah, I didn't leave a very good comment there.  I'm sure it made more
>>> sense to me at the time.  I think I was trying to say that not only do
>>> we not have a way to mask the physical hardware, but if we did, we don't
>>> have a way to retrieve the pending bits, so any pending interrupts while
>>> masked would be lost.  We might be able to deal with that by posting a
>>> spurious interrupt on unmask, but for now we do nothing as masking is
>>> usually done just to update the vector.  Thanks,
>>
>> Ok, thanks for the clarification.
>>
>> As we are at it, do you also recall if this
>>
>> --- a/hw/device-assignment.c
>> +++ b/hw/device-assignment.c
>> @@ -1573,28 +1573,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>>               */
>>          } else if (msix_masked(&orig) && !msix_masked(entry)) {
>>              /* Vector unmasked */
>> -            if (i >= adev->irq_entries_nr || !adev->entry[i].type) {
>> -                /* Previously unassigned vector, start from scratch */
>> -                assigned_dev_update_msix(pdev);
>> -                return;
>> -            } else {
>> -                /* Update an existing, previously masked vector */
>> -                struct kvm_irq_routing_entry orig = adev->entry[i];
>> -                int ret;
>> -
>> -                adev->entry[i].u.msi.address_lo = entry->addr_lo;
>> -                adev->entry[i].u.msi.address_hi = entry->addr_hi;
>> -                adev->entry[i].u.msi.data = entry->data;
>> -
>> -                ret = kvm_update_routing_entry(&orig, &adev->entry[i]);
>> -                if (ret) {
>> -                    fprintf(stderr,
>> -                            "Error updating irq routing entry (%d)\n", ret);
>> -                    return;
>> -                }
>> -
>> -                kvm_irqchip_commit_routes(kvm_state);
>> -            }
>> +            assigned_dev_update_msix(pdev);
>>          }
>>      }
>>  }
>>
>> would make a relevant difference for known workloads? I'm trying to get
>> rid of direct routing table manipulations, but I would also like to
>> avoid introducing things like kvm_irqchip_update_msi_route unless really
>> necessary. Or could VFIO make use of that as well?
> 
> It makes me a little nervous, but I don't know that it won't work.
> There's a lot more latency in turning off MSI-X and completely
> rebuilding it than there is in updating the routing of a single vector.
> You can imagine that irqbalance could be triggering this path pretty
> regularly.  Increasing vectors beyond what was previously setup is more
> of an init-time event, so the latency doesn't bother me as much.  We'd
> probably have to send some spurious interrupts for anything we might
> have missed if we take the high latency path.

Yeah, good points.

> 
> VFIO is already a little more abstracted, making use of the msix vector
> use and release interface, but we do still make use of the kvm_irqchip
> irqfd/virq interfaces.

Hmm, but due to the nature of the callbacks, we always disable/reanable
on mask/unmask. So VFIO will be slower than current device assignment in
this regard.

BTW, how do you handle the device's PBA? Pass it through to the guest?

Jan
Alex Williamson Aug. 14, 2012, 3:34 p.m. UTC | #3
On Tue, 2012-08-14 at 17:15 +0200, Jan Kiszka wrote:
> On 2012-08-14 16:31, Alex Williamson wrote:
> > On Tue, 2012-08-14 at 16:10 +0200, Jan Kiszka wrote:
> >> On 2012-08-14 16:05, Alex Williamson wrote:
> >>> On Tue, 2012-08-14 at 15:48 +0200, Jan Kiszka wrote:
> >>>> Hi Alex,
> >>>>
> >>>> you once wrote this comment in device-assignment.c, msix_mmio_write:
> >>>>
> >>>>     if (!msix_masked(&orig) && msix_masked(entry)) {
> >>>>         /*
> >>>>          * Vector masked, disable it
> >>>>          *
> >>>>          * XXX It's not clear if we can or should actually attempt
> >>>>          * to mask or disable the interrupt.  KVM doesn't have
> >>>>          * support for pending bits and kvm_assign_set_msix_entry
> >>>>          * doesn't modify the device hardware mask.  Interrupts
> >>>>          * while masked are simply not injected to the guest, so
> >>>>          * are lost.  Can we get away with always injecting an
> >>>>          * interrupt on unmask?
> >>>>          */
> >>>>
> >>>> I'm wondering what made you think that we won't inject if the vector is
> >>>> masked like this (ie. in the shadow MSI-X table). Can you recall the
> >>>> details?
> >>>>
> >>>> I'm trying to refactor this code to make the KVM interface a bit more
> >>>> encapsulating the kernel interface details, not fixing anything. Still,
> >>>> I would also like to avoid introducing regressions.
> >>>
> >>> Yeah, I didn't leave a very good comment there.  I'm sure it made more
> >>> sense to me at the time.  I think I was trying to say that not only do
> >>> we not have a way to mask the physical hardware, but if we did, we don't
> >>> have a way to retrieve the pending bits, so any pending interrupts while
> >>> masked would be lost.  We might be able to deal with that by posting a
> >>> spurious interrupt on unmask, but for now we do nothing as masking is
> >>> usually done just to update the vector.  Thanks,
> >>
> >> Ok, thanks for the clarification.
> >>
> >> As we are at it, do you also recall if this
> >>
> >> --- a/hw/device-assignment.c
> >> +++ b/hw/device-assignment.c
> >> @@ -1573,28 +1573,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
> >>               */
> >>          } else if (msix_masked(&orig) && !msix_masked(entry)) {
> >>              /* Vector unmasked */
> >> -            if (i >= adev->irq_entries_nr || !adev->entry[i].type) {
> >> -                /* Previously unassigned vector, start from scratch */
> >> -                assigned_dev_update_msix(pdev);
> >> -                return;
> >> -            } else {
> >> -                /* Update an existing, previously masked vector */
> >> -                struct kvm_irq_routing_entry orig = adev->entry[i];
> >> -                int ret;
> >> -
> >> -                adev->entry[i].u.msi.address_lo = entry->addr_lo;
> >> -                adev->entry[i].u.msi.address_hi = entry->addr_hi;
> >> -                adev->entry[i].u.msi.data = entry->data;
> >> -
> >> -                ret = kvm_update_routing_entry(&orig, &adev->entry[i]);
> >> -                if (ret) {
> >> -                    fprintf(stderr,
> >> -                            "Error updating irq routing entry (%d)\n", ret);
> >> -                    return;
> >> -                }
> >> -
> >> -                kvm_irqchip_commit_routes(kvm_state);
> >> -            }
> >> +            assigned_dev_update_msix(pdev);
> >>          }
> >>      }
> >>  }
> >>
> >> would make a relevant difference for known workloads? I'm trying to get
> >> rid of direct routing table manipulations, but I would also like to
> >> avoid introducing things like kvm_irqchip_update_msi_route unless really
> >> necessary. Or could VFIO make use of that as well?
> > 
> > It makes me a little nervous, but I don't know that it won't work.
> > There's a lot more latency in turning off MSI-X and completely
> > rebuilding it than there is in updating the routing of a single vector.
> > You can imagine that irqbalance could be triggering this path pretty
> > regularly.  Increasing vectors beyond what was previously setup is more
> > of an init-time event, so the latency doesn't bother me as much.  We'd
> > probably have to send some spurious interrupts for anything we might
> > have missed if we take the high latency path.
> 
> Yeah, good points.
> 
> > 
> > VFIO is already a little more abstracted, making use of the msix vector
> > use and release interface, but we do still make use of the kvm_irqchip
> > irqfd/virq interfaces.
> 
> Hmm, but due to the nature of the callbacks, we always disable/reanable
> on mask/unmask. So VFIO will be slower than current device assignment in
> this regard.

It's a bit awkward, I'm not thrilled with those msix callbacks but they
seem to work.  I have a similar comment in static void
vfio_msix_vector_release that maybe we should just disable direct
injection on mask so that qemu-msix can do the masking and fill in the
PBA.

> BTW, how do you handle the device's PBA? Pass it through to the guest?

We could but I'm trying to use qemu-msix infrastructure which handles
the PBA.  We've been working happily w/o good PBA support for so long, I
haven't bothered to work on a channel to get to the physical PBA yet.
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka Aug. 14, 2012, 3:47 p.m. UTC | #4
On 2012-08-14 17:34, Alex Williamson wrote:
> On Tue, 2012-08-14 at 17:15 +0200, Jan Kiszka wrote:
>> On 2012-08-14 16:31, Alex Williamson wrote:
>>> On Tue, 2012-08-14 at 16:10 +0200, Jan Kiszka wrote:
>>>> On 2012-08-14 16:05, Alex Williamson wrote:
>>>>> On Tue, 2012-08-14 at 15:48 +0200, Jan Kiszka wrote:
>>>>>> Hi Alex,
>>>>>>
>>>>>> you once wrote this comment in device-assignment.c, msix_mmio_write:
>>>>>>
>>>>>>     if (!msix_masked(&orig) && msix_masked(entry)) {
>>>>>>         /*
>>>>>>          * Vector masked, disable it
>>>>>>          *
>>>>>>          * XXX It's not clear if we can or should actually attempt
>>>>>>          * to mask or disable the interrupt.  KVM doesn't have
>>>>>>          * support for pending bits and kvm_assign_set_msix_entry
>>>>>>          * doesn't modify the device hardware mask.  Interrupts
>>>>>>          * while masked are simply not injected to the guest, so
>>>>>>          * are lost.  Can we get away with always injecting an
>>>>>>          * interrupt on unmask?
>>>>>>          */
>>>>>>
>>>>>> I'm wondering what made you think that we won't inject if the vector is
>>>>>> masked like this (ie. in the shadow MSI-X table). Can you recall the
>>>>>> details?
>>>>>>
>>>>>> I'm trying to refactor this code to make the KVM interface a bit more
>>>>>> encapsulating the kernel interface details, not fixing anything. Still,
>>>>>> I would also like to avoid introducing regressions.
>>>>>
>>>>> Yeah, I didn't leave a very good comment there.  I'm sure it made more
>>>>> sense to me at the time.  I think I was trying to say that not only do
>>>>> we not have a way to mask the physical hardware, but if we did, we don't
>>>>> have a way to retrieve the pending bits, so any pending interrupts while
>>>>> masked would be lost.  We might be able to deal with that by posting a
>>>>> spurious interrupt on unmask, but for now we do nothing as masking is
>>>>> usually done just to update the vector.  Thanks,
>>>>
>>>> Ok, thanks for the clarification.
>>>>
>>>> As we are at it, do you also recall if this
>>>>
>>>> --- a/hw/device-assignment.c
>>>> +++ b/hw/device-assignment.c
>>>> @@ -1573,28 +1573,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>>>>               */
>>>>          } else if (msix_masked(&orig) && !msix_masked(entry)) {
>>>>              /* Vector unmasked */
>>>> -            if (i >= adev->irq_entries_nr || !adev->entry[i].type) {
>>>> -                /* Previously unassigned vector, start from scratch */
>>>> -                assigned_dev_update_msix(pdev);
>>>> -                return;
>>>> -            } else {
>>>> -                /* Update an existing, previously masked vector */
>>>> -                struct kvm_irq_routing_entry orig = adev->entry[i];
>>>> -                int ret;
>>>> -
>>>> -                adev->entry[i].u.msi.address_lo = entry->addr_lo;
>>>> -                adev->entry[i].u.msi.address_hi = entry->addr_hi;
>>>> -                adev->entry[i].u.msi.data = entry->data;
>>>> -
>>>> -                ret = kvm_update_routing_entry(&orig, &adev->entry[i]);
>>>> -                if (ret) {
>>>> -                    fprintf(stderr,
>>>> -                            "Error updating irq routing entry (%d)\n", ret);
>>>> -                    return;
>>>> -                }
>>>> -
>>>> -                kvm_irqchip_commit_routes(kvm_state);
>>>> -            }
>>>> +            assigned_dev_update_msix(pdev);
>>>>          }
>>>>      }
>>>>  }
>>>>
>>>> would make a relevant difference for known workloads? I'm trying to get
>>>> rid of direct routing table manipulations, but I would also like to
>>>> avoid introducing things like kvm_irqchip_update_msi_route unless really
>>>> necessary. Or could VFIO make use of that as well?
>>>
>>> It makes me a little nervous, but I don't know that it won't work.
>>> There's a lot more latency in turning off MSI-X and completely
>>> rebuilding it than there is in updating the routing of a single vector.
>>> You can imagine that irqbalance could be triggering this path pretty
>>> regularly.  Increasing vectors beyond what was previously setup is more
>>> of an init-time event, so the latency doesn't bother me as much.  We'd
>>> probably have to send some spurious interrupts for anything we might
>>> have missed if we take the high latency path.
>>
>> Yeah, good points.
>>
>>>
>>> VFIO is already a little more abstracted, making use of the msix vector
>>> use and release interface, but we do still make use of the kvm_irqchip
>>> irqfd/virq interfaces.
>>
>> Hmm, but due to the nature of the callbacks, we always disable/reanable
>> on mask/unmask. So VFIO will be slower than current device assignment in
>> this regard.
> 
> It's a bit awkward, I'm not thrilled with those msix callbacks but they
> seem to work.  I have a similar comment in static void
> vfio_msix_vector_release that maybe we should just disable direct
> injection on mask so that qemu-msix can do the masking and fill in the
> PBA.

That will require an enhancement of the callback mechanism. So far it
does not allow to tell apart per-vector masking from general disabling.
When the latter happens, we still want to release resources, I suppose.

With such enhancement in place, we could even consider keeping the VIRQ
and MSI route active (and provide a route update service) to avoid the
tear-down/recreate overhead on fast mask/unmask cycles, e.g. for IRQ
migration.

> 
>> BTW, how do you handle the device's PBA? Pass it through to the guest?
> 
> We could but I'm trying to use qemu-msix infrastructure which handles
> the PBA.  We've been working happily w/o good PBA support for so long, I
> haven't bothered to work on a channel to get to the physical PBA yet.

I think bouncing should be OK performance-wise - until some strange
guest pops up that actually polls the PBA in high-load scenarios.

Jan
diff mbox

Patch

--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1573,28 +1573,7 @@  static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
              */
         } else if (msix_masked(&orig) && !msix_masked(entry)) {
             /* Vector unmasked */
-            if (i >= adev->irq_entries_nr || !adev->entry[i].type) {
-                /* Previously unassigned vector, start from scratch */
-                assigned_dev_update_msix(pdev);
-                return;
-            } else {
-                /* Update an existing, previously masked vector */
-                struct kvm_irq_routing_entry orig = adev->entry[i];
-                int ret;
-
-                adev->entry[i].u.msi.address_lo = entry->addr_lo;
-                adev->entry[i].u.msi.address_hi = entry->addr_hi;
-                adev->entry[i].u.msi.data = entry->data;
-
-                ret = kvm_update_routing_entry(&orig, &adev->entry[i]);
-                if (ret) {
-                    fprintf(stderr,
-                            "Error updating irq routing entry (%d)\n", ret);
-                    return;
-                }
-
-                kvm_irqchip_commit_routes(kvm_state);
-            }
+            assigned_dev_update_msix(pdev);
         }
     }
 }