[3/5] vfio: defer to enable msix in migration resume phase

Message ID	20210825075620.2607-4-longpeng2@huawei.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=Gpas=NQ=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 19AD161373 From: "Longpeng(Mike)" <longpeng2@huawei.com> To: <alex.williamson@redhat.com>, <mst@redhat.com>, <marcel.apfelbaum@gmail.com>, <pbonzini@redhat.com> Subject: [PATCH 3/5] vfio: defer to enable msix in migration resume phase Date: Wed, 25 Aug 2021 15:56:18 +0800 Message-ID: <20210825075620.2607-4-longpeng2@huawei.com> In-Reply-To: <20210825075620.2607-1-longpeng2@huawei.com> References: <20210825075620.2607-1-longpeng2@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Received-SPF: pass client-ip=45.249.212.188; envelope-from=longpeng2@huawei.com; helo=szxga02-in.huawei.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Cc: "Longpeng\(Mike\)" <longpeng2@huawei.com>, arei.gonglei@huawei.com, huangzhichao@huawei.com, qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	optimize the downtime for vfio migration \| expand [0/5] optimize the downtime for vfio migration [1/5] vfio: use helper to simplfy the failure path in vfio_msi_enable [2/5] msix: simplfy the conditional in msix_set/unset_vector_notifiers [3/5] vfio: defer to enable msix in migration resume phase [4/5] kvm: irqchip: support defer to commit the route [5/5] vfio: defer to commit kvm route in migraiton resume phase

Message ID

20210825075620.2607-4-longpeng2@huawei.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 19AD161373
From: "Longpeng(Mike)" <longpeng2@huawei.com>
To: <alex.williamson@redhat.com>, <mst@redhat.com>,
 <marcel.apfelbaum@gmail.com>, <pbonzini@redhat.com>
Subject: [PATCH 3/5] vfio: defer to enable msix in migration resume phase
Date: Wed, 25 Aug 2021 15:56:18 +0800
Message-ID: <20210825075620.2607-4-longpeng2@huawei.com>
In-Reply-To: <20210825075620.2607-1-longpeng2@huawei.com>
References: <20210825075620.2607-1-longpeng2@huawei.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Received-SPF: pass client-ip=45.249.212.188;
 envelope-from=longpeng2@huawei.com;
 helo=szxga02-in.huawei.com
X-Spam_score_int: -41
X-Spam_score: -4.2
X-Spam_bar: ----
X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: "Longpeng\(Mike\)" <longpeng2@huawei.com>, arei.gonglei@huawei.com,
 huangzhichao@huawei.com, qemu-devel@nongnu.org
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>

Series

optimize the downtime for vfio migration | expand

Commit Message

Longpeng(Mike) Aug. 25, 2021, 7:56 a.m. UTC

The vf's unmasked msix vectors will be enable one by one in
migraiton resume phase, VFIO_DEVICE_SET_IRQS will be called
for each vector, it's a bit expensive if the vf has more
vectors.

We can call VFIO_DEVICE_SET_IRQS once outside the loop of set
vector notifiers to reduce the cost.

The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
we mesure the cost of the vfio_msix_enable for each one, and
we can see 10% costs can be reduced.

        Origin          Apply this patch
1st     8               4
2nd     15              11
3rd     22              18
4th     24              25
5th     36              33
6th     44              40
7th     51              47
8th     58              54
Total   258ms           232ms

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
---
 hw/vfio/pci.c | 22 ++++++++++++++++++++++
 hw/vfio/pci.h |  1 +
 2 files changed, 23 insertions(+)

Comments

Philippe Mathieu-Daudé Aug. 25, 2021, 9:57 a.m. UTC | #1

On 8/25/21 9:56 AM, Longpeng(Mike) wrote:
> The vf's unmasked msix vectors will be enable one by one in
> migraiton resume phase, VFIO_DEVICE_SET_IRQS will be called

Typo "migration"

> for each vector, it's a bit expensive if the vf has more
> vectors.
> 
> We can call VFIO_DEVICE_SET_IRQS once outside the loop of set
> vector notifiers to reduce the cost.
> 
> The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
> we mesure the cost of the vfio_msix_enable for each one, and

Typo "measure"

> we can see 10% costs can be reduced.
> 
>         Origin          Apply this patch
> 1st     8               4
> 2nd     15              11
> 3rd     22              18
> 4th     24              25
> 5th     36              33
> 6th     44              40
> 7th     51              47
> 8th     58              54
> Total   258ms           232ms
> 
> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
> ---
>  hw/vfio/pci.c | 22 ++++++++++++++++++++++
>  hw/vfio/pci.h |  1 +
>  2 files changed, 23 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 7cc43fe..ca37fb7 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -372,6 +372,10 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
>      int ret = 0, i, argsz;
>      int32_t *fds;
>  
> +    if (!vdev->nr_vectors) {
> +        return 0;
> +    }
> +
>      argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
>  
>      irq_set = g_malloc0(argsz);
> @@ -495,6 +499,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>          }
>      }
>  
> +    if (vdev->defer_add_virq) {
> +        vdev->nr_vectors = MAX(vdev->nr_vectors, nr + 1);
> +        goto clear_pending;
> +    }
> +
>      /*
>       * We don't want to have the host allocate all possible MSI vectors
>       * for a device if they're not in use, so we shutdown and incrementally
> @@ -524,6 +533,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>          }
>      }
>  
> +clear_pending:
>      /* Disable PBA emulation when nothing more is pending. */
>      clear_bit(nr, vdev->msix->pending);
>      if (find_first_bit(vdev->msix->pending,
> @@ -608,6 +618,16 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>      if (msix_set_vector_notifiers(pdev, vfio_msix_vector_use,
>                                    vfio_msix_vector_release, NULL)) {
>          error_report("vfio: msix_set_vector_notifiers failed");
> +        return;
> +    }
> +
> +    if (!pdev->msix_function_masked && vdev->defer_add_virq) {
> +        int ret;
> +        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
> +        ret = vfio_enable_vectors(vdev, true);
> +        if (ret) {
> +            error_report("vfio: failed to enable vectors, %d", ret);
> +        }
>      }
>  
>      trace_vfio_msix_enable(vdev->vbasedev.name);
> @@ -2456,7 +2476,9 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>      if (msi_enabled(pdev)) {
>          vfio_msi_enable(vdev);
>      } else if (msix_enabled(pdev)) {
> +        vdev->defer_add_virq = true;
>          vfio_msix_enable(vdev);

What about passing defer_add_virq as boolean argument
to vfio_msix_enable()?

> +        vdev->defer_add_virq = false;
>      }
>  
>      return ret;
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 6477751..4235c83 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -171,6 +171,7 @@ struct VFIOPCIDevice {
>      bool no_kvm_ioeventfd;
>      bool no_vfio_ioeventfd;
>      bool enable_ramfb;
> +    bool defer_add_virq;
>      VFIODisplay *dpy;
>      Notifier irqchip_change_notifier;
>  };
>

Longpeng(Mike) Aug. 25, 2021, 10:06 a.m. UTC | #2

在 2021/8/25 17:57, Philippe Mathieu-Daudé 写道:
> On 8/25/21 9:56 AM, Longpeng(Mike) wrote:
>> The vf's unmasked msix vectors will be enable one by one in
>> migraiton resume phase, VFIO_DEVICE_SET_IRQS will be called
> 
> Typo "migration"
> 
Ok.

>> for each vector, it's a bit expensive if the vf has more
>> vectors.
>>
>> We can call VFIO_DEVICE_SET_IRQS once outside the loop of set
>> vector notifiers to reduce the cost.
>>
>> The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
>> we mesure the cost of the vfio_msix_enable for each one, and
> 
> Typo "measure"
> 
Ok.

>> we can see 10% costs can be reduced.
>>
>>         Origin          Apply this patch
>> 1st     8               4
>> 2nd     15              11
>> 3rd     22              18
>> 4th     24              25
>> 5th     36              33
>> 6th     44              40
>> 7th     51              47
>> 8th     58              54
>> Total   258ms           232ms
>>
>> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
>> ---
>>  hw/vfio/pci.c | 22 ++++++++++++++++++++++
>>  hw/vfio/pci.h |  1 +
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 7cc43fe..ca37fb7 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -372,6 +372,10 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
>>      int ret = 0, i, argsz;
>>      int32_t *fds;
>>  
>> +    if (!vdev->nr_vectors) {
>> +        return 0;
>> +    }
>> +
>>      argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
>>  
>>      irq_set = g_malloc0(argsz);
>> @@ -495,6 +499,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>>          }
>>      }
>>  
>> +    if (vdev->defer_add_virq) {
>> +        vdev->nr_vectors = MAX(vdev->nr_vectors, nr + 1);
>> +        goto clear_pending;
>> +    }
>> +
>>      /*
>>       * We don't want to have the host allocate all possible MSI vectors
>>       * for a device if they're not in use, so we shutdown and incrementally
>> @@ -524,6 +533,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>>          }
>>      }
>>  
>> +clear_pending:
>>      /* Disable PBA emulation when nothing more is pending. */
>>      clear_bit(nr, vdev->msix->pending);
>>      if (find_first_bit(vdev->msix->pending,
>> @@ -608,6 +618,16 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>>      if (msix_set_vector_notifiers(pdev, vfio_msix_vector_use,
>>                                    vfio_msix_vector_release, NULL)) {
>>          error_report("vfio: msix_set_vector_notifiers failed");
>> +        return;
>> +    }
>> +
>> +    if (!pdev->msix_function_masked && vdev->defer_add_virq) {
>> +        int ret;
>> +        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>> +        ret = vfio_enable_vectors(vdev, true);
>> +        if (ret) {
>> +            error_report("vfio: failed to enable vectors, %d", ret);
>> +        }
>>      }
>>  
>>      trace_vfio_msix_enable(vdev->vbasedev.name);
>> @@ -2456,7 +2476,9 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>>      if (msi_enabled(pdev)) {
>>          vfio_msi_enable(vdev);
>>      } else if (msix_enabled(pdev)) {
>> +        vdev->defer_add_virq = true;
>>          vfio_msix_enable(vdev);
> 
> What about passing defer_add_virq as boolean argument
> to vfio_msix_enable()?
> 
We'll use defer_add_virq in the deep of the calltrace, it need to change more
functions to support the parameter passing in this way.

>> +        vdev->defer_add_virq = false;
>>      }
>>  
>>      return ret;
>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>> index 6477751..4235c83 100644
>> --- a/hw/vfio/pci.h
>> +++ b/hw/vfio/pci.h
>> @@ -171,6 +171,7 @@ struct VFIOPCIDevice {
>>      bool no_kvm_ioeventfd;
>>      bool no_vfio_ioeventfd;
>>      bool enable_ramfb;
>> +    bool defer_add_virq;
>>      VFIODisplay *dpy;
>>      Notifier irqchip_change_notifier;
>>  };
>>
> 
> .
>

Alex Williamson Sept. 3, 2021, 9:56 p.m. UTC | #3

On Wed, 25 Aug 2021 15:56:18 +0800
"Longpeng(Mike)" <longpeng2@huawei.com> wrote:

> The vf's unmasked msix vectors will be enable one by one in
> migraiton resume phase, VFIO_DEVICE_SET_IRQS will be called
> for each vector, it's a bit expensive if the vf has more
> vectors.
> 
> We can call VFIO_DEVICE_SET_IRQS once outside the loop of set
> vector notifiers to reduce the cost.
> 
> The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
> we mesure the cost of the vfio_msix_enable for each one, and
> we can see 10% costs can be reduced.
> 
>         Origin          Apply this patch

Original?

> 1st     8               4
> 2nd     15              11
> 3rd     22              18
> 4th     24              25
> 5th     36              33
> 6th     44              40
> 7th     51              47
> 8th     58              54
> Total   258ms           232ms

If the values here are ms for execution of vfio_msix_enable() per VF,
why are the values increasing per VF?  Do we have 65 vectors per VF or
do we have 65 vectors total, weighted towards to higher VFs?
This doesn't make sense without the data from the last patch in the
series.

> 
> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
> ---
>  hw/vfio/pci.c | 22 ++++++++++++++++++++++
>  hw/vfio/pci.h |  1 +
>  2 files changed, 23 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 7cc43fe..ca37fb7 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -372,6 +372,10 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
>      int ret = 0, i, argsz;
>      int32_t *fds;
>  
> +    if (!vdev->nr_vectors) {
> +        return 0;
> +    }

How would this occur?  Via the new call below?  But then we'd leave
vfio_msix_enabled() with MSI-X DISABLED???

> +
>      argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
>  
>      irq_set = g_malloc0(argsz);
> @@ -495,6 +499,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>          }
>      }
>  
> +    if (vdev->defer_add_virq) {
> +        vdev->nr_vectors = MAX(vdev->nr_vectors, nr + 1);
> +        goto clear_pending;
> +    }

This is a really ugly use of 'goto' to simply jump around code you'd
like to skip rather than reformat the function with branches to
conditionalize that code.  Gotos for consolidated error paths, retries,
hard to break loops are ok, not this.


> +
>      /*
>       * We don't want to have the host allocate all possible MSI vectors
>       * for a device if they're not in use, so we shutdown and incrementally
> @@ -524,6 +533,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>          }
>      }
>  
> +clear_pending:
>      /* Disable PBA emulation when nothing more is pending. */
>      clear_bit(nr, vdev->msix->pending);
>      if (find_first_bit(vdev->msix->pending,
> @@ -608,6 +618,16 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>      if (msix_set_vector_notifiers(pdev, vfio_msix_vector_use,
>                                    vfio_msix_vector_release, NULL)) {
>          error_report("vfio: msix_set_vector_notifiers failed");
> +        return;
> +    }
> +
> +    if (!pdev->msix_function_masked && vdev->defer_add_virq) {
> +        int ret;
> +        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
> +        ret = vfio_enable_vectors(vdev, true);
> +        if (ret) {
> +            error_report("vfio: failed to enable vectors, %d", ret);
> +        }
>      }
>  
>      trace_vfio_msix_enable(vdev->vbasedev.name);
> @@ -2456,7 +2476,9 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>      if (msi_enabled(pdev)) {
>          vfio_msi_enable(vdev);
>      } else if (msix_enabled(pdev)) {
> +        vdev->defer_add_virq = true;
>          vfio_msix_enable(vdev);
> +        vdev->defer_add_virq = false;


Ick.  Why is this a special case for vfio_msix_enable()?  Wouldn't we
prefer to always batch vector-use work while we're in the process of
enabling MSI-X?  

>      }
>  
>      return ret;
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 6477751..4235c83 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -171,6 +171,7 @@ struct VFIOPCIDevice {
>      bool no_kvm_ioeventfd;
>      bool no_vfio_ioeventfd;
>      bool enable_ramfb;
> +    bool defer_add_virq;
>      VFIODisplay *dpy;
>      Notifier irqchip_change_notifier;
>  };

Longpeng(Mike) Sept. 7, 2021, 2:12 a.m. UTC | #4

在 2021/9/4 5:56, Alex Williamson 写道:
> On Wed, 25 Aug 2021 15:56:18 +0800
> "Longpeng(Mike)" <longpeng2@huawei.com> wrote:
> 
>> The vf's unmasked msix vectors will be enable one by one in
>> migraiton resume phase, VFIO_DEVICE_SET_IRQS will be called
>> for each vector, it's a bit expensive if the vf has more
>> vectors.
>>
>> We can call VFIO_DEVICE_SET_IRQS once outside the loop of set
>> vector notifiers to reduce the cost.
>>
>> The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
>> we mesure the cost of the vfio_msix_enable for each one, and
>> we can see 10% costs can be reduced.
>>
>>         Origin          Apply this patch
> 
> Original?
> 
>> 1st     8               4
>> 2nd     15              11
>> 3rd     22              18
>> 4th     24              25
>> 5th     36              33
>> 6th     44              40
>> 7th     51              47
>> 8th     58              54
>> Total   258ms           232ms
> 
> If the values here are ms for execution of vfio_msix_enable() per VF,

Yes.

> why are the values increasing per VF?  Do we have 65 vectors per VF or
> do we have 65 vectors total, weighted towards to higher VFs?

We have 65 vectors per VF.

The KVM_SET_GSI_ROUTING scans and updates all of the assigned irqfds
unconditionally, so it will spend more time if there are more irqfds.

We have 65 irqfds when process the 1st VF, 130 irqfds when process the 2nd VF,
195 irqfds when process the 3rd VF ... so we'll see the values are increasing as
a result.

> This doesn't make sense without the data from the last patch in the
> series.
> 
>>
>> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
>> ---
>>  hw/vfio/pci.c | 22 ++++++++++++++++++++++
>>  hw/vfio/pci.h |  1 +
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 7cc43fe..ca37fb7 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -372,6 +372,10 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
>>      int ret = 0, i, argsz;
>>      int32_t *fds;
>>  
>> +    if (!vdev->nr_vectors) {
>> +        return 0;
>> +    }
> 
> How would this occur?  Via the new call below?  But then we'd leave
> vfio_msix_enabled() with MSI-X DISABLED???
> 
>> +
>>      argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
>>  
>>      irq_set = g_malloc0(argsz);
>> @@ -495,6 +499,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>>          }
>>      }
>>  
>> +    if (vdev->defer_add_virq) {
>> +        vdev->nr_vectors = MAX(vdev->nr_vectors, nr + 1);
>> +        goto clear_pending;
>> +    }
> 
> This is a really ugly use of 'goto' to simply jump around code you'd
> like to skip rather than reformat the function with branches to
> conditionalize that code.  Gotos for consolidated error paths, retries,
> hard to break loops are ok, not this.
> 

Got it, thanks.

> 
>> +
>>      /*
>>       * We don't want to have the host allocate all possible MSI vectors
>>       * for a device if they're not in use, so we shutdown and incrementally
>> @@ -524,6 +533,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>>          }
>>      }
>>  
>> +clear_pending:
>>      /* Disable PBA emulation when nothing more is pending. */
>>      clear_bit(nr, vdev->msix->pending);
>>      if (find_first_bit(vdev->msix->pending,
>> @@ -608,6 +618,16 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>>      if (msix_set_vector_notifiers(pdev, vfio_msix_vector_use,
>>                                    vfio_msix_vector_release, NULL)) {
>>          error_report("vfio: msix_set_vector_notifiers failed");
>> +        return;
>> +    }
>> +
>> +    if (!pdev->msix_function_masked && vdev->defer_add_virq) {
>> +        int ret;
>> +        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>> +        ret = vfio_enable_vectors(vdev, true);
>> +        if (ret) {
>> +            error_report("vfio: failed to enable vectors, %d", ret);
>> +        }
>>      }
>>  
>>      trace_vfio_msix_enable(vdev->vbasedev.name);
>> @@ -2456,7 +2476,9 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>>      if (msi_enabled(pdev)) {
>>          vfio_msi_enable(vdev);
>>      } else if (msix_enabled(pdev)) {
>> +        vdev->defer_add_virq = true;
>>          vfio_msix_enable(vdev);
>> +        vdev->defer_add_virq = false;
> 
> 
> Ick.  Why is this a special case for vfio_msix_enable()?  Wouldn't we
> prefer to always batch vector-use work while we're in the process of
> enabling MSI-X?  
> 

Ok, will do in next version.

In addition, I'll rename the field to 'defer_kvm_irq_routing' as you suggested
in another earlier thread.

    '''
    > -            vfio_add_kvm_msi_virq(vdev, vector, nr, true);
    > +            if (unlikely(vdev->defer_set_virq)) {

    Likewise this could be "vdev->defer_kvm_irq_routing" and we could apply
    it to all IRQ types.

    > +                vector->need_switch = true;
    > +            } else {
    '''

>>      }
>>  
>>      return ret;
>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>> index 6477751..4235c83 100644
>> --- a/hw/vfio/pci.h
>> +++ b/hw/vfio/pci.h
>> @@ -171,6 +171,7 @@ struct VFIOPCIDevice {
>>      bool no_kvm_ioeventfd;
>>      bool no_vfio_ioeventfd;
>>      bool enable_ramfb;
>> +    bool defer_add_virq;
>>      VFIODisplay *dpy;
>>      Notifier irqchip_change_notifier;
>>  };
> 
> .
>

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 7cc43fe..ca37fb7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -372,6 +372,10 @@  static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
     int ret = 0, i, argsz;
     int32_t *fds;
 
+    if (!vdev->nr_vectors) {
+        return 0;
+    }
+
     argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
 
     irq_set = g_malloc0(argsz);
@@ -495,6 +499,11 @@  static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
         }
     }
 
+    if (vdev->defer_add_virq) {
+        vdev->nr_vectors = MAX(vdev->nr_vectors, nr + 1);
+        goto clear_pending;
+    }
+
     /*
      * We don't want to have the host allocate all possible MSI vectors
      * for a device if they're not in use, so we shutdown and incrementally
@@ -524,6 +533,7 @@  static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
         }
     }
 
+clear_pending:
     /* Disable PBA emulation when nothing more is pending. */
     clear_bit(nr, vdev->msix->pending);
     if (find_first_bit(vdev->msix->pending,
@@ -608,6 +618,16 @@  static void vfio_msix_enable(VFIOPCIDevice *vdev)
     if (msix_set_vector_notifiers(pdev, vfio_msix_vector_use,
                                   vfio_msix_vector_release, NULL)) {
         error_report("vfio: msix_set_vector_notifiers failed");
+        return;
+    }
+
+    if (!pdev->msix_function_masked && vdev->defer_add_virq) {
+        int ret;
+        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
+        ret = vfio_enable_vectors(vdev, true);
+        if (ret) {
+            error_report("vfio: failed to enable vectors, %d", ret);
+        }
     }
 
     trace_vfio_msix_enable(vdev->vbasedev.name);
@@ -2456,7 +2476,9 @@  static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
     if (msi_enabled(pdev)) {
         vfio_msi_enable(vdev);
     } else if (msix_enabled(pdev)) {
+        vdev->defer_add_virq = true;
         vfio_msix_enable(vdev);
+        vdev->defer_add_virq = false;
     }
 
     return ret;
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 6477751..4235c83 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -171,6 +171,7 @@  struct VFIOPCIDevice {
     bool no_kvm_ioeventfd;
     bool no_vfio_ioeventfd;
     bool enable_ramfb;
+    bool defer_add_virq;
     VFIODisplay *dpy;
     Notifier irqchip_change_notifier;
 };

[3/5] vfio: defer to enable msix in migration resume phase

Commit Message

Comments

Patch