diff mbox series

[v2,1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration

Message ID 20210914015326.1494-2-jiangkunkun@huawei.com (mailing list archive)
State New, archived
Headers show
Series vfio: Some fixes about vfio-pci MMIO RAM mapping | expand

Commit Message

Kunkun Jiang Sept. 14, 2021, 1:53 a.m. UTC
We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
vfio_pci_write_config to improve IO performance.
The MemoryRegions of destination VM will not be expanded
successful in live migration, because their addresses have
been updated in vmstate_load_state (vfio_pci_load_config).

So iterate BARs in vfio_pci_write_config and try to update
sub-page BARs.

Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
Reported-by: Nianyao Tang <tangnianyao@huawei.com>
Reported-by: Qixin Gan <ganqixin@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---
 hw/vfio/pci.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Eric Auger Oct. 21, 2021, 4:15 p.m. UTC | #1
Hi Kunkun,

On 9/14/21 3:53 AM, Kunkun Jiang wrote:
> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
> vfio_pci_write_config to improve IO performance.
s/to vfio_pci_write_config/ in vfio_pci_write_config()
> The MemoryRegions of destination VM will not be expanded
> successful in live migration, because their addresses have
s/will not be expanded successful in live migration/are not expanded
anymore after live migration (?) Is that the correct symptom?
> been updated in vmstate_load_state (vfio_pci_load_config).
>
> So iterate BARs in vfio_pci_write_config and try to update
> sub-page BARs.
>
> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
is it an actual fix or an optimization?
> Reported-by: Nianyao Tang <tangnianyao@huawei.com>
> Reported-by: Qixin Gan <ganqixin@huawei.com>
> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> ---
>  hw/vfio/pci.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index e1ea1d8a23..43c7e93153 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2453,7 +2453,12 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>  {
>      VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>      PCIDevice *pdev = &vdev->pdev;
> -    int ret;
> +    pcibus_t old_addr[PCI_NUM_REGIONS - 1];
> +    int bar, ret;
> +
> +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
> +        old_addr[bar] = pdev->io_regions[bar].addr;
> +    }
what are those values before the vmstate_load_state ie. can't you do the
vfio_sub_page_bar_update_mapping() unconditionnaly on old_addr[bar] !=
pdev->io_regions[bar].addr
>  
>      ret = vmstate_load_state(f, &vmstate_vfio_pci_config, vdev, 1);
>      if (ret) {
> @@ -2463,6 +2468,14 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>      vfio_pci_write_config(pdev, PCI_COMMAND,
>                            pci_get_word(pdev->config + PCI_COMMAND), 2);
>  
> +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
> +        if (old_addr[bar] != pdev->io_regions[bar].addr &&
> +            vdev->bars[bar].region.size > 0 &&
> +            vdev->bars[bar].region.size < qemu_real_host_page_size) {
> +            vfio_sub_page_bar_update_mapping(pdev, bar);
> +        }
> +    }
> +
>      if (msi_enabled(pdev)) {
>          vfio_msi_enable(vdev);
>      } else if (msix_enabled(pdev)) {
Thanks

Eric
Kunkun Jiang Oct. 22, 2021, 10:01 a.m. UTC | #2
Hi Eric,

On 2021/10/22 0:15, Eric Auger wrote:
> Hi Kunkun,
>
> On 9/14/21 3:53 AM, Kunkun Jiang wrote:
>> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
>> vfio_pci_write_config to improve IO performance.
> s/to vfio_pci_write_config/ in vfio_pci_write_config()
Thank you for your review. I will correct it in v3.
>> The MemoryRegions of destination VM will not be expanded
>> successful in live migration, because their addresses have
> s/will not be expanded successful in live migration/are not expanded
> anymore after live migration (?) Is that the correct symptom?
You are right. They are not expanded anymore after live migration,
not expanded unsuccessfully. I used the wrong words.
>> been updated in vmstate_load_state (vfio_pci_load_config).
>>
>> So iterate BARs in vfio_pci_write_config and try to update
>> sub-page BARs.
>>
>> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
> is it an actual fix or an optimization?
I recently realized that this is actually an optimization.

The VF driver in VM use the assembly language instructions,
which can operate two registers simultaneously, like stp, ldp.
These instructions would result in non-ISV data abort, which
cannot be handled well at the moment.

If the driver doesn't use such instructions,  not expanding
only affects performance.

I will add these to the commit message in the next version.
>> Reported-by: Nianyao Tang <tangnianyao@huawei.com>
>> Reported-by: Qixin Gan <ganqixin@huawei.com>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> ---
>>   hw/vfio/pci.c | 15 ++++++++++++++-
>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index e1ea1d8a23..43c7e93153 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2453,7 +2453,12 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>>   {
>>       VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>>       PCIDevice *pdev = &vdev->pdev;
>> -    int ret;
>> +    pcibus_t old_addr[PCI_NUM_REGIONS - 1];
>> +    int bar, ret;
>> +
>> +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>> +        old_addr[bar] = pdev->io_regions[bar].addr;
>> +    }
> what are those values before the vmstate_load_state ie. can't you do the
Are you referring to pdev->io_regions[bar].addr ? All of the bars addr is
PCI_BAR_UNMAPPED (~(pcibus_t)0) before the vmstate_load_state.
> vfio_sub_page_bar_update_mapping() unconditionnaly on old_addr[bar] !=
> pdev->io_regions[bar].addr
The size of Bar is a power of 2. The Bar, whose size is greater than host
page size, doesn't need to be expanded.

Can you explain more? May be I misunderstood you.

Thanks,
Kunkun Jiang
>>   
>>       ret = vmstate_load_state(f, &vmstate_vfio_pci_config, vdev, 1);
>>       if (ret) {
>> @@ -2463,6 +2468,14 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>>       vfio_pci_write_config(pdev, PCI_COMMAND,
>>                             pci_get_word(pdev->config + PCI_COMMAND), 2);
>>   
>> +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>> +        if (old_addr[bar] != pdev->io_regions[bar].addr &&
>> +            vdev->bars[bar].region.size > 0 &&
>> +            vdev->bars[bar].region.size < qemu_real_host_page_size) {
>> +            vfio_sub_page_bar_update_mapping(pdev, bar);
>> +        }
>> +    }
>> +
>>       if (msi_enabled(pdev)) {
>>           vfio_msi_enable(vdev);
>>       } else if (msix_enabled(pdev)) {
> Thanks
>
> Eric
>
> .
Eric Auger Oct. 23, 2021, 2:26 p.m. UTC | #3
Hi Kunkun,

On 10/22/21 12:01 PM, Kunkun Jiang wrote:
> Hi Eric,
>
> On 2021/10/22 0:15, Eric Auger wrote:
>> Hi Kunkun,
>>
>> On 9/14/21 3:53 AM, Kunkun Jiang wrote:
>>> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
>>> vfio_pci_write_config to improve IO performance.
>> s/to vfio_pci_write_config/ in vfio_pci_write_config()
> Thank you for your review. I will correct it in v3.
>>> The MemoryRegions of destination VM will not be expanded
>>> successful in live migration, because their addresses have
>> s/will not be expanded successful in live migration/are not expanded
>> anymore after live migration (?) Is that the correct symptom?
> You are right. They are not expanded anymore after live migration,
> not expanded unsuccessfully. I used the wrong words.
>>> been updated in vmstate_load_state (vfio_pci_load_config).
>>>
>>> So iterate BARs in vfio_pci_write_config and try to update
>>> sub-page BARs.
>>>
>>> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI
>>> devices)
>> is it an actual fix or an optimization?
> I recently realized that this is actually an optimization.
>
> The VF driver in VM use the assembly language instructions,
> which can operate two registers simultaneously, like stp, ldp.
> These instructions would result in non-ISV data abort, which
> cannot be handled well at the moment.
>
> If the driver doesn't use such instructions,  not expanding
> only affects performance.
>
> I will add these to the commit message in the next version.
>>> Reported-by: Nianyao Tang <tangnianyao@huawei.com>
>>> Reported-by: Qixin Gan <ganqixin@huawei.com>
>>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>>> ---
>>>   hw/vfio/pci.c | 15 ++++++++++++++-
>>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index e1ea1d8a23..43c7e93153 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -2453,7 +2453,12 @@ static int vfio_pci_load_config(VFIODevice
>>> *vbasedev, QEMUFile *f)
>>>   {
>>>       VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice,
>>> vbasedev);
>>>       PCIDevice *pdev = &vdev->pdev;
>>> -    int ret;
>>> +    pcibus_t old_addr[PCI_NUM_REGIONS - 1];
>>> +    int bar, ret;
>>> +
>>> +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>>> +        old_addr[bar] = pdev->io_regions[bar].addr;
>>> +    }
>> what are those values before the vmstate_load_state ie. can't you do the
> Are you referring to pdev->io_regions[bar].addr ? All of the bars addr is
> PCI_BAR_UNMAPPED (~(pcibus_t)0) before the vmstate_load_state.
OK that was my assumption. What I meant is in that case you always have

old_addr[bar] != pdev->io_regions[bar].addr, right? In the positive this check is not needed and you don't need old_addr at all.
In the original code this was needed since one wanted to call 
vfio_sub_page_bar_update_mapping() only for the bar base address that were changed during the 
vfio_pci_write_config.

Thanks

Eric

>> vfio_sub_page_bar_update_mapping() unconditionnaly on old_addr[bar] !=
>> pdev->io_regions[bar].addr
> The size of Bar is a power of 2. The Bar, whose size is greater than host
> page size, doesn't need to be expanded.
>
> Can you explain more? May be I misunderstood you.
>
> Thanks,
> Kunkun Jiang
>>>         ret = vmstate_load_state(f, &vmstate_vfio_pci_config, vdev, 1);
>>>       if (ret) {
>>> @@ -2463,6 +2468,14 @@ static int vfio_pci_load_config(VFIODevice
>>> *vbasedev, QEMUFile *f)
>>>       vfio_pci_write_config(pdev, PCI_COMMAND,
>>>                             pci_get_word(pdev->config +
>>> PCI_COMMAND), 2);
>>>   +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>>> +        if (old_addr[bar] != pdev->io_regions[bar].addr &&
>>> +            vdev->bars[bar].region.size > 0 &&
>>> +            vdev->bars[bar].region.size < qemu_real_host_page_size) {
>>> +            vfio_sub_page_bar_update_mapping(pdev, bar);
>>> +        }
>>> +    }
>>> +
>>>       if (msi_enabled(pdev)) {
>>>           vfio_msi_enable(vdev);
>>>       } else if (msix_enabled(pdev)) {
>> Thanks
>>
>> Eric
>>
>> .
>
Kunkun Jiang Oct. 24, 2021, 2:09 a.m. UTC | #4
Hi Eric,

On 2021/10/23 22:26, Eric Auger wrote:
> Hi Kunkun,
>
> On 10/22/21 12:01 PM, Kunkun Jiang wrote:
>> Hi Eric,
>>
>> On 2021/10/22 0:15, Eric Auger wrote:
>>> Hi Kunkun,
>>>
>>> On 9/14/21 3:53 AM, Kunkun Jiang wrote:
>>>> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
>>>> vfio_pci_write_config to improve IO performance.
>>> s/to vfio_pci_write_config/ in vfio_pci_write_config()
>> Thank you for your review. I will correct it in v3.
>>>> The MemoryRegions of destination VM will not be expanded
>>>> successful in live migration, because their addresses have
>>> s/will not be expanded successful in live migration/are not expanded
>>> anymore after live migration (?) Is that the correct symptom?
>> You are right. They are not expanded anymore after live migration,
>> not expanded unsuccessfully. I used the wrong words.
>>>> been updated in vmstate_load_state (vfio_pci_load_config).
>>>>
>>>> So iterate BARs in vfio_pci_write_config and try to update
>>>> sub-page BARs.
>>>>
>>>> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI
>>>> devices)
>>> is it an actual fix or an optimization?
>> I recently realized that this is actually an optimization.
>>
>> The VF driver in VM use the assembly language instructions,
>> which can operate two registers simultaneously, like stp, ldp.
>> These instructions would result in non-ISV data abort, which
>> cannot be handled well at the moment.
>>
>> If the driver doesn't use such instructions,  not expanding
>> only affects performance.
>>
>> I will add these to the commit message in the next version.
>>>> Reported-by: Nianyao Tang <tangnianyao@huawei.com>
>>>> Reported-by: Qixin Gan <ganqixin@huawei.com>
>>>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>>>> ---
>>>>    hw/vfio/pci.c | 15 ++++++++++++++-
>>>>    1 file changed, 14 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index e1ea1d8a23..43c7e93153 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -2453,7 +2453,12 @@ static int vfio_pci_load_config(VFIODevice
>>>> *vbasedev, QEMUFile *f)
>>>>    {
>>>>        VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice,
>>>> vbasedev);
>>>>        PCIDevice *pdev = &vdev->pdev;
>>>> -    int ret;
>>>> +    pcibus_t old_addr[PCI_NUM_REGIONS - 1];
>>>> +    int bar, ret;
>>>> +
>>>> +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>>>> +        old_addr[bar] = pdev->io_regions[bar].addr;
>>>> +    }
>>> what are those values before the vmstate_load_state ie. can't you do the
>> Are you referring to pdev->io_regions[bar].addr ? All of the bars addr is
>> PCI_BAR_UNMAPPED (~(pcibus_t)0) before the vmstate_load_state.
> OK that was my assumption. What I meant is in that case you always have
>
> old_addr[bar] != pdev->io_regions[bar].addr, right? In the positive this check is not needed and you don't need old_addr at all.
> In the original code this was needed since one wanted to call
> vfio_sub_page_bar_update_mapping() only for the bar base address that were changed during the
> vfio_pci_write_config.
As far as I know, there is at least one case. If the VF driver is not loaded
(insmod xxx.ko) in the VM, we will have old_addr[bar] == 
pdev->io_regions[bar].addr.
The vfio_sub_page_bar_update_mapping() will be called when 0 < bar size 
< host page size.
But vfio_sub_page_bar_update_mapping() will not change anything in this 
case.

Thanks,
Kunkun Jiang
>
> Thanks
>
> Eric
>
>>> vfio_sub_page_bar_update_mapping() unconditionnaly on old_addr[bar] !=
>>> pdev->io_regions[bar].addr
>> The size of Bar is a power of 2. The Bar, whose size is greater than host
>> page size, doesn't need to be expanded.
>>
>> Can you explain more? May be I misunderstood you.
>>
>> Thanks,
>> Kunkun Jiang
>>>>          ret = vmstate_load_state(f, &vmstate_vfio_pci_config, vdev, 1);
>>>>        if (ret) {
>>>> @@ -2463,6 +2468,14 @@ static int vfio_pci_load_config(VFIODevice
>>>> *vbasedev, QEMUFile *f)
>>>>        vfio_pci_write_config(pdev, PCI_COMMAND,
>>>>                              pci_get_word(pdev->config +
>>>> PCI_COMMAND), 2);
>>>>    +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>>>> +        if (old_addr[bar] != pdev->io_regions[bar].addr &&
>>>> +            vdev->bars[bar].region.size > 0 &&
>>>> +            vdev->bars[bar].region.size < qemu_real_host_page_size) {
>>>> +            vfio_sub_page_bar_update_mapping(pdev, bar);
>>>> +        }
>>>> +    }
>>>> +
>>>>        if (msi_enabled(pdev)) {
>>>>            vfio_msi_enable(vdev);
>>>>        } else if (msix_enabled(pdev)) {
>>> Thanks
>>>
>>> Eric
>>>
>>> .
> .
diff mbox series

Patch

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e1ea1d8a23..43c7e93153 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2453,7 +2453,12 @@  static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
 {
     VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
     PCIDevice *pdev = &vdev->pdev;
-    int ret;
+    pcibus_t old_addr[PCI_NUM_REGIONS - 1];
+    int bar, ret;
+
+    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
+        old_addr[bar] = pdev->io_regions[bar].addr;
+    }
 
     ret = vmstate_load_state(f, &vmstate_vfio_pci_config, vdev, 1);
     if (ret) {
@@ -2463,6 +2468,14 @@  static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
     vfio_pci_write_config(pdev, PCI_COMMAND,
                           pci_get_word(pdev->config + PCI_COMMAND), 2);
 
+    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
+        if (old_addr[bar] != pdev->io_regions[bar].addr &&
+            vdev->bars[bar].region.size > 0 &&
+            vdev->bars[bar].region.size < qemu_real_host_page_size) {
+            vfio_sub_page_bar_update_mapping(pdev, bar);
+        }
+    }
+
     if (msi_enabled(pdev)) {
         vfio_msi_enable(vdev);
     } else if (msix_enabled(pdev)) {