Message ID | 502A5C63.70803@siemens.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 2012-08-14 at 16:10 +0200, Jan Kiszka wrote: > On 2012-08-14 16:05, Alex Williamson wrote: > > On Tue, 2012-08-14 at 15:48 +0200, Jan Kiszka wrote: > >> Hi Alex, > >> > >> you once wrote this comment in device-assignment.c, msix_mmio_write: > >> > >> if (!msix_masked(&orig) && msix_masked(entry)) { > >> /* > >> * Vector masked, disable it > >> * > >> * XXX It's not clear if we can or should actually attempt > >> * to mask or disable the interrupt. KVM doesn't have > >> * support for pending bits and kvm_assign_set_msix_entry > >> * doesn't modify the device hardware mask. Interrupts > >> * while masked are simply not injected to the guest, so > >> * are lost. Can we get away with always injecting an > >> * interrupt on unmask? > >> */ > >> > >> I'm wondering what made you think that we won't inject if the vector is > >> masked like this (ie. in the shadow MSI-X table). Can you recall the > >> details? > >> > >> I'm trying to refactor this code to make the KVM interface a bit more > >> encapsulating the kernel interface details, not fixing anything. Still, > >> I would also like to avoid introducing regressions. > > > > Yeah, I didn't leave a very good comment there. I'm sure it made more > > sense to me at the time. I think I was trying to say that not only do > > we not have a way to mask the physical hardware, but if we did, we don't > > have a way to retrieve the pending bits, so any pending interrupts while > > masked would be lost. We might be able to deal with that by posting a > > spurious interrupt on unmask, but for now we do nothing as masking is > > usually done just to update the vector. Thanks, > > Ok, thanks for the clarification. > > As we are at it, do you also recall if this > > --- a/hw/device-assignment.c > +++ b/hw/device-assignment.c > @@ -1573,28 +1573,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr, > */ > } else if (msix_masked(&orig) && !msix_masked(entry)) { > /* Vector unmasked */ > - if (i >= adev->irq_entries_nr || !adev->entry[i].type) { > - /* Previously unassigned vector, start from scratch */ > - assigned_dev_update_msix(pdev); > - return; > - } else { > - /* Update an existing, previously masked vector */ > - struct kvm_irq_routing_entry orig = adev->entry[i]; > - int ret; > - > - adev->entry[i].u.msi.address_lo = entry->addr_lo; > - adev->entry[i].u.msi.address_hi = entry->addr_hi; > - adev->entry[i].u.msi.data = entry->data; > - > - ret = kvm_update_routing_entry(&orig, &adev->entry[i]); > - if (ret) { > - fprintf(stderr, > - "Error updating irq routing entry (%d)\n", ret); > - return; > - } > - > - kvm_irqchip_commit_routes(kvm_state); > - } > + assigned_dev_update_msix(pdev); > } > } > } > > would make a relevant difference for known workloads? I'm trying to get > rid of direct routing table manipulations, but I would also like to > avoid introducing things like kvm_irqchip_update_msi_route unless really > necessary. Or could VFIO make use of that as well? It makes me a little nervous, but I don't know that it won't work. There's a lot more latency in turning off MSI-X and completely rebuilding it than there is in updating the routing of a single vector. You can imagine that irqbalance could be triggering this path pretty regularly. Increasing vectors beyond what was previously setup is more of an init-time event, so the latency doesn't bother me as much. We'd probably have to send some spurious interrupts for anything we might have missed if we take the high latency path. VFIO is already a little more abstracted, making use of the msix vector use and release interface, but we do still make use of the kvm_irqchip irqfd/virq interfaces. > PS: Will try to have a look at your main VFIO patch later today. Thanks :) Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2012-08-14 16:31, Alex Williamson wrote: > On Tue, 2012-08-14 at 16:10 +0200, Jan Kiszka wrote: >> On 2012-08-14 16:05, Alex Williamson wrote: >>> On Tue, 2012-08-14 at 15:48 +0200, Jan Kiszka wrote: >>>> Hi Alex, >>>> >>>> you once wrote this comment in device-assignment.c, msix_mmio_write: >>>> >>>> if (!msix_masked(&orig) && msix_masked(entry)) { >>>> /* >>>> * Vector masked, disable it >>>> * >>>> * XXX It's not clear if we can or should actually attempt >>>> * to mask or disable the interrupt. KVM doesn't have >>>> * support for pending bits and kvm_assign_set_msix_entry >>>> * doesn't modify the device hardware mask. Interrupts >>>> * while masked are simply not injected to the guest, so >>>> * are lost. Can we get away with always injecting an >>>> * interrupt on unmask? >>>> */ >>>> >>>> I'm wondering what made you think that we won't inject if the vector is >>>> masked like this (ie. in the shadow MSI-X table). Can you recall the >>>> details? >>>> >>>> I'm trying to refactor this code to make the KVM interface a bit more >>>> encapsulating the kernel interface details, not fixing anything. Still, >>>> I would also like to avoid introducing regressions. >>> >>> Yeah, I didn't leave a very good comment there. I'm sure it made more >>> sense to me at the time. I think I was trying to say that not only do >>> we not have a way to mask the physical hardware, but if we did, we don't >>> have a way to retrieve the pending bits, so any pending interrupts while >>> masked would be lost. We might be able to deal with that by posting a >>> spurious interrupt on unmask, but for now we do nothing as masking is >>> usually done just to update the vector. Thanks, >> >> Ok, thanks for the clarification. >> >> As we are at it, do you also recall if this >> >> --- a/hw/device-assignment.c >> +++ b/hw/device-assignment.c >> @@ -1573,28 +1573,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr, >> */ >> } else if (msix_masked(&orig) && !msix_masked(entry)) { >> /* Vector unmasked */ >> - if (i >= adev->irq_entries_nr || !adev->entry[i].type) { >> - /* Previously unassigned vector, start from scratch */ >> - assigned_dev_update_msix(pdev); >> - return; >> - } else { >> - /* Update an existing, previously masked vector */ >> - struct kvm_irq_routing_entry orig = adev->entry[i]; >> - int ret; >> - >> - adev->entry[i].u.msi.address_lo = entry->addr_lo; >> - adev->entry[i].u.msi.address_hi = entry->addr_hi; >> - adev->entry[i].u.msi.data = entry->data; >> - >> - ret = kvm_update_routing_entry(&orig, &adev->entry[i]); >> - if (ret) { >> - fprintf(stderr, >> - "Error updating irq routing entry (%d)\n", ret); >> - return; >> - } >> - >> - kvm_irqchip_commit_routes(kvm_state); >> - } >> + assigned_dev_update_msix(pdev); >> } >> } >> } >> >> would make a relevant difference for known workloads? I'm trying to get >> rid of direct routing table manipulations, but I would also like to >> avoid introducing things like kvm_irqchip_update_msi_route unless really >> necessary. Or could VFIO make use of that as well? > > It makes me a little nervous, but I don't know that it won't work. > There's a lot more latency in turning off MSI-X and completely > rebuilding it than there is in updating the routing of a single vector. > You can imagine that irqbalance could be triggering this path pretty > regularly. Increasing vectors beyond what was previously setup is more > of an init-time event, so the latency doesn't bother me as much. We'd > probably have to send some spurious interrupts for anything we might > have missed if we take the high latency path. Yeah, good points. > > VFIO is already a little more abstracted, making use of the msix vector > use and release interface, but we do still make use of the kvm_irqchip > irqfd/virq interfaces. Hmm, but due to the nature of the callbacks, we always disable/reanable on mask/unmask. So VFIO will be slower than current device assignment in this regard. BTW, how do you handle the device's PBA? Pass it through to the guest? Jan
On Tue, 2012-08-14 at 17:15 +0200, Jan Kiszka wrote: > On 2012-08-14 16:31, Alex Williamson wrote: > > On Tue, 2012-08-14 at 16:10 +0200, Jan Kiszka wrote: > >> On 2012-08-14 16:05, Alex Williamson wrote: > >>> On Tue, 2012-08-14 at 15:48 +0200, Jan Kiszka wrote: > >>>> Hi Alex, > >>>> > >>>> you once wrote this comment in device-assignment.c, msix_mmio_write: > >>>> > >>>> if (!msix_masked(&orig) && msix_masked(entry)) { > >>>> /* > >>>> * Vector masked, disable it > >>>> * > >>>> * XXX It's not clear if we can or should actually attempt > >>>> * to mask or disable the interrupt. KVM doesn't have > >>>> * support for pending bits and kvm_assign_set_msix_entry > >>>> * doesn't modify the device hardware mask. Interrupts > >>>> * while masked are simply not injected to the guest, so > >>>> * are lost. Can we get away with always injecting an > >>>> * interrupt on unmask? > >>>> */ > >>>> > >>>> I'm wondering what made you think that we won't inject if the vector is > >>>> masked like this (ie. in the shadow MSI-X table). Can you recall the > >>>> details? > >>>> > >>>> I'm trying to refactor this code to make the KVM interface a bit more > >>>> encapsulating the kernel interface details, not fixing anything. Still, > >>>> I would also like to avoid introducing regressions. > >>> > >>> Yeah, I didn't leave a very good comment there. I'm sure it made more > >>> sense to me at the time. I think I was trying to say that not only do > >>> we not have a way to mask the physical hardware, but if we did, we don't > >>> have a way to retrieve the pending bits, so any pending interrupts while > >>> masked would be lost. We might be able to deal with that by posting a > >>> spurious interrupt on unmask, but for now we do nothing as masking is > >>> usually done just to update the vector. Thanks, > >> > >> Ok, thanks for the clarification. > >> > >> As we are at it, do you also recall if this > >> > >> --- a/hw/device-assignment.c > >> +++ b/hw/device-assignment.c > >> @@ -1573,28 +1573,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr, > >> */ > >> } else if (msix_masked(&orig) && !msix_masked(entry)) { > >> /* Vector unmasked */ > >> - if (i >= adev->irq_entries_nr || !adev->entry[i].type) { > >> - /* Previously unassigned vector, start from scratch */ > >> - assigned_dev_update_msix(pdev); > >> - return; > >> - } else { > >> - /* Update an existing, previously masked vector */ > >> - struct kvm_irq_routing_entry orig = adev->entry[i]; > >> - int ret; > >> - > >> - adev->entry[i].u.msi.address_lo = entry->addr_lo; > >> - adev->entry[i].u.msi.address_hi = entry->addr_hi; > >> - adev->entry[i].u.msi.data = entry->data; > >> - > >> - ret = kvm_update_routing_entry(&orig, &adev->entry[i]); > >> - if (ret) { > >> - fprintf(stderr, > >> - "Error updating irq routing entry (%d)\n", ret); > >> - return; > >> - } > >> - > >> - kvm_irqchip_commit_routes(kvm_state); > >> - } > >> + assigned_dev_update_msix(pdev); > >> } > >> } > >> } > >> > >> would make a relevant difference for known workloads? I'm trying to get > >> rid of direct routing table manipulations, but I would also like to > >> avoid introducing things like kvm_irqchip_update_msi_route unless really > >> necessary. Or could VFIO make use of that as well? > > > > It makes me a little nervous, but I don't know that it won't work. > > There's a lot more latency in turning off MSI-X and completely > > rebuilding it than there is in updating the routing of a single vector. > > You can imagine that irqbalance could be triggering this path pretty > > regularly. Increasing vectors beyond what was previously setup is more > > of an init-time event, so the latency doesn't bother me as much. We'd > > probably have to send some spurious interrupts for anything we might > > have missed if we take the high latency path. > > Yeah, good points. > > > > > VFIO is already a little more abstracted, making use of the msix vector > > use and release interface, but we do still make use of the kvm_irqchip > > irqfd/virq interfaces. > > Hmm, but due to the nature of the callbacks, we always disable/reanable > on mask/unmask. So VFIO will be slower than current device assignment in > this regard. It's a bit awkward, I'm not thrilled with those msix callbacks but they seem to work. I have a similar comment in static void vfio_msix_vector_release that maybe we should just disable direct injection on mask so that qemu-msix can do the masking and fill in the PBA. > BTW, how do you handle the device's PBA? Pass it through to the guest? We could but I'm trying to use qemu-msix infrastructure which handles the PBA. We've been working happily w/o good PBA support for so long, I haven't bothered to work on a channel to get to the physical PBA yet. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2012-08-14 17:34, Alex Williamson wrote: > On Tue, 2012-08-14 at 17:15 +0200, Jan Kiszka wrote: >> On 2012-08-14 16:31, Alex Williamson wrote: >>> On Tue, 2012-08-14 at 16:10 +0200, Jan Kiszka wrote: >>>> On 2012-08-14 16:05, Alex Williamson wrote: >>>>> On Tue, 2012-08-14 at 15:48 +0200, Jan Kiszka wrote: >>>>>> Hi Alex, >>>>>> >>>>>> you once wrote this comment in device-assignment.c, msix_mmio_write: >>>>>> >>>>>> if (!msix_masked(&orig) && msix_masked(entry)) { >>>>>> /* >>>>>> * Vector masked, disable it >>>>>> * >>>>>> * XXX It's not clear if we can or should actually attempt >>>>>> * to mask or disable the interrupt. KVM doesn't have >>>>>> * support for pending bits and kvm_assign_set_msix_entry >>>>>> * doesn't modify the device hardware mask. Interrupts >>>>>> * while masked are simply not injected to the guest, so >>>>>> * are lost. Can we get away with always injecting an >>>>>> * interrupt on unmask? >>>>>> */ >>>>>> >>>>>> I'm wondering what made you think that we won't inject if the vector is >>>>>> masked like this (ie. in the shadow MSI-X table). Can you recall the >>>>>> details? >>>>>> >>>>>> I'm trying to refactor this code to make the KVM interface a bit more >>>>>> encapsulating the kernel interface details, not fixing anything. Still, >>>>>> I would also like to avoid introducing regressions. >>>>> >>>>> Yeah, I didn't leave a very good comment there. I'm sure it made more >>>>> sense to me at the time. I think I was trying to say that not only do >>>>> we not have a way to mask the physical hardware, but if we did, we don't >>>>> have a way to retrieve the pending bits, so any pending interrupts while >>>>> masked would be lost. We might be able to deal with that by posting a >>>>> spurious interrupt on unmask, but for now we do nothing as masking is >>>>> usually done just to update the vector. Thanks, >>>> >>>> Ok, thanks for the clarification. >>>> >>>> As we are at it, do you also recall if this >>>> >>>> --- a/hw/device-assignment.c >>>> +++ b/hw/device-assignment.c >>>> @@ -1573,28 +1573,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr, >>>> */ >>>> } else if (msix_masked(&orig) && !msix_masked(entry)) { >>>> /* Vector unmasked */ >>>> - if (i >= adev->irq_entries_nr || !adev->entry[i].type) { >>>> - /* Previously unassigned vector, start from scratch */ >>>> - assigned_dev_update_msix(pdev); >>>> - return; >>>> - } else { >>>> - /* Update an existing, previously masked vector */ >>>> - struct kvm_irq_routing_entry orig = adev->entry[i]; >>>> - int ret; >>>> - >>>> - adev->entry[i].u.msi.address_lo = entry->addr_lo; >>>> - adev->entry[i].u.msi.address_hi = entry->addr_hi; >>>> - adev->entry[i].u.msi.data = entry->data; >>>> - >>>> - ret = kvm_update_routing_entry(&orig, &adev->entry[i]); >>>> - if (ret) { >>>> - fprintf(stderr, >>>> - "Error updating irq routing entry (%d)\n", ret); >>>> - return; >>>> - } >>>> - >>>> - kvm_irqchip_commit_routes(kvm_state); >>>> - } >>>> + assigned_dev_update_msix(pdev); >>>> } >>>> } >>>> } >>>> >>>> would make a relevant difference for known workloads? I'm trying to get >>>> rid of direct routing table manipulations, but I would also like to >>>> avoid introducing things like kvm_irqchip_update_msi_route unless really >>>> necessary. Or could VFIO make use of that as well? >>> >>> It makes me a little nervous, but I don't know that it won't work. >>> There's a lot more latency in turning off MSI-X and completely >>> rebuilding it than there is in updating the routing of a single vector. >>> You can imagine that irqbalance could be triggering this path pretty >>> regularly. Increasing vectors beyond what was previously setup is more >>> of an init-time event, so the latency doesn't bother me as much. We'd >>> probably have to send some spurious interrupts for anything we might >>> have missed if we take the high latency path. >> >> Yeah, good points. >> >>> >>> VFIO is already a little more abstracted, making use of the msix vector >>> use and release interface, but we do still make use of the kvm_irqchip >>> irqfd/virq interfaces. >> >> Hmm, but due to the nature of the callbacks, we always disable/reanable >> on mask/unmask. So VFIO will be slower than current device assignment in >> this regard. > > It's a bit awkward, I'm not thrilled with those msix callbacks but they > seem to work. I have a similar comment in static void > vfio_msix_vector_release that maybe we should just disable direct > injection on mask so that qemu-msix can do the masking and fill in the > PBA. That will require an enhancement of the callback mechanism. So far it does not allow to tell apart per-vector masking from general disabling. When the latter happens, we still want to release resources, I suppose. With such enhancement in place, we could even consider keeping the VIRQ and MSI route active (and provide a route update service) to avoid the tear-down/recreate overhead on fast mask/unmask cycles, e.g. for IRQ migration. > >> BTW, how do you handle the device's PBA? Pass it through to the guest? > > We could but I'm trying to use qemu-msix infrastructure which handles > the PBA. We've been working happily w/o good PBA support for so long, I > haven't bothered to work on a channel to get to the physical PBA yet. I think bouncing should be OK performance-wise - until some strange guest pops up that actually polls the PBA in high-load scenarios. Jan
--- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1573,28 +1573,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr, */ } else if (msix_masked(&orig) && !msix_masked(entry)) { /* Vector unmasked */ - if (i >= adev->irq_entries_nr || !adev->entry[i].type) { - /* Previously unassigned vector, start from scratch */ - assigned_dev_update_msix(pdev); - return; - } else { - /* Update an existing, previously masked vector */ - struct kvm_irq_routing_entry orig = adev->entry[i]; - int ret; - - adev->entry[i].u.msi.address_lo = entry->addr_lo; - adev->entry[i].u.msi.address_hi = entry->addr_hi; - adev->entry[i].u.msi.data = entry->data; - - ret = kvm_update_routing_entry(&orig, &adev->entry[i]); - if (ret) { - fprintf(stderr, - "Error updating irq routing entry (%d)\n", ret); - return; - } - - kvm_irqchip_commit_routes(kvm_state); - } + assigned_dev_update_msix(pdev); } } }