diff mbox series

[v4,06/11] vpci/header: handle p2m range sets per BAR

Message ID 20211105065629.940943-7-andr2000@gmail.com (mailing list archive)
State Superseded
Headers show
Series PCI devices passthrough on Arm, part 3 | expand

Commit Message

Oleksandr Andrushchenko Nov. 5, 2021, 6:56 a.m. UTC
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Instead of handling a single range set, that contains all the memory
regions of all the BARs and ROM, have them per BAR.

This is in preparation of making non-identity mappings in p2m for the
MMIOs/ROM.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
Since v3:
- re-work vpci_cancel_pending accordingly to the per-BAR handling
- s/num_mem_ranges/map_pending and s/uint8_t/bool
- ASSERT(bar->mem) in modify_bars
- create and destroy the rangesets on add/remove
---
 xen/drivers/vpci/header.c | 178 ++++++++++++++++++++++++++------------
 xen/drivers/vpci/vpci.c   |  26 +++++-
 xen/include/xen/vpci.h    |   3 +-
 3 files changed, 150 insertions(+), 57 deletions(-)

Comments

Jan Beulich Nov. 19, 2021, 12:05 p.m. UTC | #1
On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Instead of handling a single range set, that contains all the memory
> regions of all the BARs and ROM, have them per BAR.

Iirc Roger did indicate agreement with the spitting. May I nevertheless
ask that for posterity you say a word here about the overhead, to make
clear this was a conscious decision?

Jan
Oleksandr Andrushchenko Nov. 19, 2021, 12:13 p.m. UTC | #2
On 19.11.21 14:05, Jan Beulich wrote:
> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Instead of handling a single range set, that contains all the memory
>> regions of all the BARs and ROM, have them per BAR.
> Iirc Roger did indicate agreement with the spitting. May I nevertheless
> ask that for posterity you say a word here about the overhead, to make
> clear this was a conscious decision?
Sure, but could you please help me with that sentence to please your
eye? I mean that it was you seeing the overhead while I was not as
to implement the similar functionality as range sets do I still think we'll
duplicate range sets at the end of the day.
> Jan
>
Thank you in advance,
Oleksandr
Jan Beulich Nov. 19, 2021, 12:45 p.m. UTC | #3
On 19.11.2021 13:13, Oleksandr Andrushchenko wrote:
> On 19.11.21 14:05, Jan Beulich wrote:
>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> Instead of handling a single range set, that contains all the memory
>>> regions of all the BARs and ROM, have them per BAR.
>> Iirc Roger did indicate agreement with the spitting. May I nevertheless
>> ask that for posterity you say a word here about the overhead, to make
>> clear this was a conscious decision?
> Sure, but could you please help me with that sentence to please your
> eye? I mean that it was you seeing the overhead while I was not as
> to implement the similar functionality as range sets do I still think we'll
> duplicate range sets at the end of the day.

"Note that rangesets were chosen here despite there being only up to
<N> separate ranges in each set (typically just 1)." Albeit that's
then still lacking a justification for the choice. Ease of
implementation?

As to overhead - did you compare sizeof(struct rangeset) + N *
sizeof(struct range) with just N * sizeof(unsigned long [2])?

Jan
Oleksandr Andrushchenko Nov. 19, 2021, 12:50 p.m. UTC | #4
On 19.11.21 14:45, Jan Beulich wrote:
> On 19.11.2021 13:13, Oleksandr Andrushchenko wrote:
>> On 19.11.21 14:05, Jan Beulich wrote:
>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> Instead of handling a single range set, that contains all the memory
>>>> regions of all the BARs and ROM, have them per BAR.
>>> Iirc Roger did indicate agreement with the spitting. May I nevertheless
>>> ask that for posterity you say a word here about the overhead, to make
>>> clear this was a conscious decision?
>> Sure, but could you please help me with that sentence to please your
>> eye? I mean that it was you seeing the overhead while I was not as
>> to implement the similar functionality as range sets do I still think we'll
>> duplicate range sets at the end of the day.
> "Note that rangesets were chosen here despite there being only up to
> <N> separate ranges in each set (typically just 1)." Albeit that's
> then still lacking a justification for the choice. Ease of
> implementation?
I guess yes. I'll put:

"Note that rangesets were chosen here despite there being only up to
<N> separate ranges in each set (typically just 1). But rangeset per BAR
was chosen for the ease of implementation and existing code re-usability."

>
> As to overhead - did you compare sizeof(struct rangeset) + N *
> sizeof(struct range) with just N * sizeof(unsigned long [2])?
I was not thinking about data memory sizes in the first place, but new code
introduced to handle that. And to be maintained.
>
> Jan
>
Thank you,
Oleksandr
Jan Beulich Nov. 19, 2021, 1:06 p.m. UTC | #5
On 19.11.2021 13:50, Oleksandr Andrushchenko wrote:
> On 19.11.21 14:45, Jan Beulich wrote:
>> On 19.11.2021 13:13, Oleksandr Andrushchenko wrote:
>>> On 19.11.21 14:05, Jan Beulich wrote:
>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>
>>>>> Instead of handling a single range set, that contains all the memory
>>>>> regions of all the BARs and ROM, have them per BAR.
>>>> Iirc Roger did indicate agreement with the spitting. May I nevertheless
>>>> ask that for posterity you say a word here about the overhead, to make
>>>> clear this was a conscious decision?
>>> Sure, but could you please help me with that sentence to please your
>>> eye? I mean that it was you seeing the overhead while I was not as
>>> to implement the similar functionality as range sets do I still think we'll
>>> duplicate range sets at the end of the day.
>> "Note that rangesets were chosen here despite there being only up to
>> <N> separate ranges in each set (typically just 1)." Albeit that's
>> then still lacking a justification for the choice. Ease of
>> implementation?
> I guess yes. I'll put:
> 
> "Note that rangesets were chosen here despite there being only up to
> <N> separate ranges in each set (typically just 1). But rangeset per BAR
> was chosen for the ease of implementation and existing code re-usability."

FTAOD please don't forget to replace the <N> - I wasn't sure if it would
be 2 or 3. Also (nit) I don't think starting the 2nd sentence with "But
..." fits with the 1st sentence.

Jan
Jan Beulich Nov. 19, 2021, 1:16 p.m. UTC | #6
On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
> @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev)
>      INIT_LIST_HEAD(&pdev->vpci->handlers);
>      spin_lock_init(&pdev->vpci->lock);
>  
> +    header = &pdev->vpci->header;
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        struct vpci_bar *bar = &header->bars[i];
> +
> +        bar->mem = rangeset_new(NULL, NULL, 0);

I don't recall why an anonymous range set was chosen back at the time
when vPCI was first implemented, but I think this needs to be changed
now that DomU-s get supported. Whether you do so right here or in a
prereq patch is secondary to me. It may be desirable to exclude them
from rangeset_domain_printk() (which would likely require a new
RANGESETF_* flag), but I think such resources should be associated
with their domains.

Jan
Oleksandr Andrushchenko Nov. 19, 2021, 1:19 p.m. UTC | #7
On 19.11.21 15:06, Jan Beulich wrote:
> On 19.11.2021 13:50, Oleksandr Andrushchenko wrote:
>> On 19.11.21 14:45, Jan Beulich wrote:
>>> On 19.11.2021 13:13, Oleksandr Andrushchenko wrote:
>>>> On 19.11.21 14:05, Jan Beulich wrote:
>>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>
>>>>>> Instead of handling a single range set, that contains all the memory
>>>>>> regions of all the BARs and ROM, have them per BAR.
>>>>> Iirc Roger did indicate agreement with the spitting. May I nevertheless
>>>>> ask that for posterity you say a word here about the overhead, to make
>>>>> clear this was a conscious decision?
>>>> Sure, but could you please help me with that sentence to please your
>>>> eye? I mean that it was you seeing the overhead while I was not as
>>>> to implement the similar functionality as range sets do I still think we'll
>>>> duplicate range sets at the end of the day.
>>> "Note that rangesets were chosen here despite there being only up to
>>> <N> separate ranges in each set (typically just 1)." Albeit that's
>>> then still lacking a justification for the choice. Ease of
>>> implementation?
>> I guess yes. I'll put:
>>
>> "Note that rangesets were chosen here despite there being only up to
>> <N> separate ranges in each set (typically just 1). But rangeset per BAR
>> was chosen for the ease of implementation and existing code re-usability."
> FTAOD please don't forget to replace the <N> - I wasn't sure if it would
> be 2 or 3.
It seems we can't put the exact number as it depends on how many MSI/MSI-X
holes are there and that depends on an arbitrary device properties.
>   Also (nit) I don't think starting the 2nd sentence with "But
> ..." fits with the 1st sentence.
Sure, I will clean it up
>
> Jan
>
Thank you,
Oleksandr
Jan Beulich Nov. 19, 2021, 1:29 p.m. UTC | #8
On 19.11.2021 14:19, Oleksandr Andrushchenko wrote:
> 
> 
> On 19.11.21 15:06, Jan Beulich wrote:
>> On 19.11.2021 13:50, Oleksandr Andrushchenko wrote:
>>> On 19.11.21 14:45, Jan Beulich wrote:
>>>> On 19.11.2021 13:13, Oleksandr Andrushchenko wrote:
>>>>> On 19.11.21 14:05, Jan Beulich wrote:
>>>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>
>>>>>>> Instead of handling a single range set, that contains all the memory
>>>>>>> regions of all the BARs and ROM, have them per BAR.
>>>>>> Iirc Roger did indicate agreement with the spitting. May I nevertheless
>>>>>> ask that for posterity you say a word here about the overhead, to make
>>>>>> clear this was a conscious decision?
>>>>> Sure, but could you please help me with that sentence to please your
>>>>> eye? I mean that it was you seeing the overhead while I was not as
>>>>> to implement the similar functionality as range sets do I still think we'll
>>>>> duplicate range sets at the end of the day.
>>>> "Note that rangesets were chosen here despite there being only up to
>>>> <N> separate ranges in each set (typically just 1)." Albeit that's
>>>> then still lacking a justification for the choice. Ease of
>>>> implementation?
>>> I guess yes. I'll put:
>>>
>>> "Note that rangesets were chosen here despite there being only up to
>>> <N> separate ranges in each set (typically just 1). But rangeset per BAR
>>> was chosen for the ease of implementation and existing code re-usability."
>> FTAOD please don't forget to replace the <N> - I wasn't sure if it would
>> be 2 or 3.
> It seems we can't put the exact number as it depends on how many MSI/MSI-X
> holes are there and that depends on an arbitrary device properties.

There aren't any MSI holes, and there can be at most 2 MSI-X holes iirc
(MSI-X table and PBA). What I don't recall is whether there are
constraints on these two, but istr them being fully independent. This
would make the upper bound 3 (both in one BAR, other BARs then all using
just a single range).

Jan
Oleksandr Andrushchenko Nov. 19, 2021, 1:38 p.m. UTC | #9
On 19.11.21 15:29, Jan Beulich wrote:
> On 19.11.2021 14:19, Oleksandr Andrushchenko wrote:
>>
>> On 19.11.21 15:06, Jan Beulich wrote:
>>> On 19.11.2021 13:50, Oleksandr Andrushchenko wrote:
>>>> On 19.11.21 14:45, Jan Beulich wrote:
>>>>> On 19.11.2021 13:13, Oleksandr Andrushchenko wrote:
>>>>>> On 19.11.21 14:05, Jan Beulich wrote:
>>>>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>>
>>>>>>>> Instead of handling a single range set, that contains all the memory
>>>>>>>> regions of all the BARs and ROM, have them per BAR.
>>>>>>> Iirc Roger did indicate agreement with the spitting. May I nevertheless
>>>>>>> ask that for posterity you say a word here about the overhead, to make
>>>>>>> clear this was a conscious decision?
>>>>>> Sure, but could you please help me with that sentence to please your
>>>>>> eye? I mean that it was you seeing the overhead while I was not as
>>>>>> to implement the similar functionality as range sets do I still think we'll
>>>>>> duplicate range sets at the end of the day.
>>>>> "Note that rangesets were chosen here despite there being only up to
>>>>> <N> separate ranges in each set (typically just 1)." Albeit that's
>>>>> then still lacking a justification for the choice. Ease of
>>>>> implementation?
>>>> I guess yes. I'll put:
>>>>
>>>> "Note that rangesets were chosen here despite there being only up to
>>>> <N> separate ranges in each set (typically just 1). But rangeset per BAR
>>>> was chosen for the ease of implementation and existing code re-usability."
>>> FTAOD please don't forget to replace the <N> - I wasn't sure if it would
>>> be 2 or 3.
>> It seems we can't put the exact number as it depends on how many MSI/MSI-X
>> holes are there and that depends on an arbitrary device properties.
> There aren't any MSI holes, and there can be at most 2 MSI-X holes iirc
> (MSI-X table and PBA). What I don't recall is whether there are
> constraints on these two, but istr them being fully independent. This
> would make the upper bound 3 (both in one BAR, other BARs then all using
> just a single range).
So if they are both in a single BAR (this is what I probably saw while
running QEMU for PVH Dom0 tests), then we may have up to 3 range
sets per BAR at max, so I will use 3 instead of N in description and will
probably put some description how we came up with N == 3.
>
> Jan
>
Thank you!!
Oleksandr
Oleksandr Andrushchenko Nov. 19, 2021, 1:41 p.m. UTC | #10
On 19.11.21 15:16, Jan Beulich wrote:
> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>> @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>       INIT_LIST_HEAD(&pdev->vpci->handlers);
>>       spin_lock_init(&pdev->vpci->lock);
>>   
>> +    header = &pdev->vpci->header;
>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> +    {
>> +        struct vpci_bar *bar = &header->bars[i];
>> +
>> +        bar->mem = rangeset_new(NULL, NULL, 0);
> I don't recall why an anonymous range set was chosen back at the time
> when vPCI was first implemented, but I think this needs to be changed
> now that DomU-s get supported. Whether you do so right here or in a
> prereq patch is secondary to me. It may be desirable to exclude them
> from rangeset_domain_printk() (which would likely require a new
> RANGESETF_* flag), but I think such resources should be associated
> with their domains.
What would be the proper name for such a range set then?
"vpci_bar"?
> Jan
>
Thank you,
Oleksandr
Jan Beulich Nov. 19, 2021, 1:57 p.m. UTC | #11
On 19.11.2021 14:41, Oleksandr Andrushchenko wrote:
> 
> 
> On 19.11.21 15:16, Jan Beulich wrote:
>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>> @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>>       INIT_LIST_HEAD(&pdev->vpci->handlers);
>>>       spin_lock_init(&pdev->vpci->lock);
>>>   
>>> +    header = &pdev->vpci->header;
>>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>> +    {
>>> +        struct vpci_bar *bar = &header->bars[i];
>>> +
>>> +        bar->mem = rangeset_new(NULL, NULL, 0);
>> I don't recall why an anonymous range set was chosen back at the time
>> when vPCI was first implemented, but I think this needs to be changed
>> now that DomU-s get supported. Whether you do so right here or in a
>> prereq patch is secondary to me. It may be desirable to exclude them
>> from rangeset_domain_printk() (which would likely require a new
>> RANGESETF_* flag), but I think such resources should be associated
>> with their domains.
> What would be the proper name for such a range set then?
> "vpci_bar"?

E.g. bb:dd.f:BARn

Jan
Oleksandr Andrushchenko Nov. 19, 2021, 2:09 p.m. UTC | #12
On 19.11.21 15:57, Jan Beulich wrote:
> On 19.11.2021 14:41, Oleksandr Andrushchenko wrote:
>>
>> On 19.11.21 15:16, Jan Beulich wrote:
>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>>> @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>>>        INIT_LIST_HEAD(&pdev->vpci->handlers);
>>>>        spin_lock_init(&pdev->vpci->lock);
>>>>    
>>>> +    header = &pdev->vpci->header;
>>>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>>> +    {
>>>> +        struct vpci_bar *bar = &header->bars[i];
>>>> +
>>>> +        bar->mem = rangeset_new(NULL, NULL, 0);
>>> I don't recall why an anonymous range set was chosen back at the time
>>> when vPCI was first implemented, but I think this needs to be changed
>>> now that DomU-s get supported. Whether you do so right here or in a
>>> prereq patch is secondary to me. It may be desirable to exclude them
>>> from rangeset_domain_printk() (which would likely require a new
>>> RANGESETF_* flag), but I think such resources should be associated
>>> with their domains.
>> What would be the proper name for such a range set then?
>> "vpci_bar"?
> E.g. bb:dd.f:BARn
Hm, indeed
I can only see a single flag RANGESETF_prettyprint_hex which tells
*how* to print, but I can't see any limitation in *what* to print.
So, do you mean I want some logic to be implemented in
rangeset_domain_printk so it knows that this entry needs to be skipped
while printing? RANGESETF_skip_print?
>
> Jan
>
Thank you,
Oleksandr
Jan Beulich Nov. 22, 2021, 8:24 a.m. UTC | #13
On 19.11.2021 15:09, Oleksandr Andrushchenko wrote:
> On 19.11.21 15:57, Jan Beulich wrote:
>> On 19.11.2021 14:41, Oleksandr Andrushchenko wrote:
>>> On 19.11.21 15:16, Jan Beulich wrote:
>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>>>> @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>>>>        INIT_LIST_HEAD(&pdev->vpci->handlers);
>>>>>        spin_lock_init(&pdev->vpci->lock);
>>>>>    
>>>>> +    header = &pdev->vpci->header;
>>>>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>>>> +    {
>>>>> +        struct vpci_bar *bar = &header->bars[i];
>>>>> +
>>>>> +        bar->mem = rangeset_new(NULL, NULL, 0);
>>>> I don't recall why an anonymous range set was chosen back at the time
>>>> when vPCI was first implemented, but I think this needs to be changed
>>>> now that DomU-s get supported. Whether you do so right here or in a
>>>> prereq patch is secondary to me. It may be desirable to exclude them
>>>> from rangeset_domain_printk() (which would likely require a new
>>>> RANGESETF_* flag), but I think such resources should be associated
>>>> with their domains.
>>> What would be the proper name for such a range set then?
>>> "vpci_bar"?
>> E.g. bb:dd.f:BARn
> Hm, indeed
> I can only see a single flag RANGESETF_prettyprint_hex which tells
> *how* to print, but I can't see any limitation in *what* to print.
> So, do you mean I want some logic to be implemented in
> rangeset_domain_printk so it knows that this entry needs to be skipped
> while printing? RANGESETF_skip_print?

Yes, albeit I'd call the flag e.g. RANGESETF_no_print.

Jan
Oleksandr Andrushchenko Nov. 22, 2021, 8:31 a.m. UTC | #14
On 22.11.21 10:24, Jan Beulich wrote:
> On 19.11.2021 15:09, Oleksandr Andrushchenko wrote:
>> On 19.11.21 15:57, Jan Beulich wrote:
>>> On 19.11.2021 14:41, Oleksandr Andrushchenko wrote:
>>>> On 19.11.21 15:16, Jan Beulich wrote:
>>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
>>>>>> @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>>>>>         INIT_LIST_HEAD(&pdev->vpci->handlers);
>>>>>>         spin_lock_init(&pdev->vpci->lock);
>>>>>>     
>>>>>> +    header = &pdev->vpci->header;
>>>>>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>>>>> +    {
>>>>>> +        struct vpci_bar *bar = &header->bars[i];
>>>>>> +
>>>>>> +        bar->mem = rangeset_new(NULL, NULL, 0);
>>>>> I don't recall why an anonymous range set was chosen back at the time
>>>>> when vPCI was first implemented, but I think this needs to be changed
>>>>> now that DomU-s get supported. Whether you do so right here or in a
>>>>> prereq patch is secondary to me. It may be desirable to exclude them
>>>>> from rangeset_domain_printk() (which would likely require a new
>>>>> RANGESETF_* flag), but I think such resources should be associated
>>>>> with their domains.
>>>> What would be the proper name for such a range set then?
>>>> "vpci_bar"?
>>> E.g. bb:dd.f:BARn
>> Hm, indeed
>> I can only see a single flag RANGESETF_prettyprint_hex which tells
>> *how* to print, but I can't see any limitation in *what* to print.
>> So, do you mean I want some logic to be implemented in
>> rangeset_domain_printk so it knows that this entry needs to be skipped
>> while printing? RANGESETF_skip_print?
> Yes, albeit I'd call the flag e.g. RANGESETF_no_print.
Then I see two patches here: one which introduces a generic RANGESETF_no_print
flag and the second one converting anonymous range set used by vPCI
>
> Jan
>
Thank you,
Oleksandr
diff mbox series

Patch

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 1239051ee8ff..5fc2dfbbc864 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -131,34 +131,50 @@  static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
 
 bool vpci_process_pending(struct vcpu *v)
 {
-    if ( v->vpci.mem )
+    if ( v->vpci.map_pending )
     {
         struct map_data data = {
             .d = v->domain,
             .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
         };
-        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
+        struct pci_dev *pdev = v->vpci.pdev;
+        struct vpci_header *header = &pdev->vpci->header;
+        unsigned int i;
 
-        if ( rc == -ERESTART )
-            return true;
+        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+        {
+            struct vpci_bar *bar = &header->bars[i];
+            int rc;
 
-        spin_lock(&v->vpci.pdev->vpci->lock);
-        /* Disable memory decoding unconditionally on failure. */
-        modify_decoding(v->vpci.pdev,
-                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
-                        !rc && v->vpci.rom_only);
-        spin_unlock(&v->vpci.pdev->vpci->lock);
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
 
-        vpci_cancel_pending(v->vpci.pdev);
-        if ( rc )
-            /*
-             * FIXME: in case of failure remove the device from the domain.
-             * Note that there might still be leftover mappings. While this is
-             * safe for Dom0, for DomUs the domain will likely need to be
-             * killed in order to avoid leaking stale p2m mappings on
-             * failure.
-             */
-            vpci_remove_device(v->vpci.pdev);
+            rc = rangeset_consume_ranges(bar->mem, map_range, &data);
+
+            if ( rc == -ERESTART )
+                return true;
+
+            spin_lock(&pdev->vpci->lock);
+            /* Disable memory decoding unconditionally on failure. */
+            modify_decoding(pdev,
+                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
+                            !rc && v->vpci.rom_only);
+            spin_unlock(&pdev->vpci->lock);
+
+            if ( rc )
+            {
+                /*
+                 * FIXME: in case of failure remove the device from the domain.
+                 * Note that there might still be leftover mappings. While this is
+                 * safe for Dom0, for DomUs the domain will likely need to be
+                 * killed in order to avoid leaking stale p2m mappings on
+                 * failure.
+                 */
+                vpci_remove_device(pdev);
+                break;
+            }
+        }
+        v->vpci.map_pending = false;
     }
 
     return false;
@@ -169,22 +185,48 @@  void vpci_cancel_pending(const struct pci_dev *pdev)
     struct vcpu *v = current;
 
     /* Cancel any pending work now. */
-    if ( v->vpci.mem && v->vpci.pdev == pdev)
+    if ( v->vpci.map_pending && v->vpci.pdev == pdev)
     {
-        rangeset_destroy(v->vpci.mem);
-        v->vpci.mem = NULL;
+        struct vpci_header *header = &pdev->vpci->header;
+        unsigned int i;
+        int rc;
+
+        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+        {
+            struct vpci_bar *bar = &header->bars[i];
+
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
+
+            rc = rangeset_remove_range(bar->mem, 0, ~0ULL);
+            if ( !rc )
+                printk(XENLOG_ERR
+                       "%pd %pp failed to remove range set for BAR: %d\n",
+                       v->domain, &pdev->sbdf, rc);
+        }
+        v->vpci.map_pending = false;
     }
 }
 
 static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
-                            struct rangeset *mem, uint16_t cmd)
+                            uint16_t cmd)
 {
     struct map_data data = { .d = d, .map = true };
-    int rc;
+    struct vpci_header *header = &pdev->vpci->header;
+    int rc = 0;
+    unsigned int i;
+
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+    {
+        struct vpci_bar *bar = &header->bars[i];
+
+        if ( rangeset_is_empty(bar->mem) )
+            continue;
 
-    while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART )
-        process_pending_softirqs();
-    rangeset_destroy(mem);
+        while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
+                                              &data)) == -ERESTART )
+            process_pending_softirqs();
+    }
     if ( !rc )
         modify_decoding(pdev, cmd, false);
 
@@ -192,7 +234,7 @@  static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
 }
 
 static void defer_map(struct domain *d, struct pci_dev *pdev,
-                      struct rangeset *mem, uint16_t cmd, bool rom_only)
+                      uint16_t cmd, bool rom_only)
 {
     struct vcpu *curr = current;
 
@@ -203,9 +245,9 @@  static void defer_map(struct domain *d, struct pci_dev *pdev,
      * started for the same device if the domain is not well-behaved.
      */
     curr->vpci.pdev = pdev;
-    curr->vpci.mem = mem;
     curr->vpci.cmd = cmd;
     curr->vpci.rom_only = rom_only;
+    curr->vpci.map_pending = true;
     /*
      * Raise a scheduler softirq in order to prevent the guest from resuming
      * execution with pending mapping operations, to trigger the invocation
@@ -217,42 +259,40 @@  static void defer_map(struct domain *d, struct pci_dev *pdev,
 static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
 {
     struct vpci_header *header = &pdev->vpci->header;
-    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
     struct pci_dev *tmp, *dev = NULL;
     const struct vpci_msix *msix = pdev->vpci->msix;
-    unsigned int i;
+    unsigned int i, j;
     int rc;
-
-    if ( !mem )
-        return -ENOMEM;
+    bool map_pending;
 
     /*
-     * Create a rangeset that represents the current device BARs memory region
+     * Create a rangeset per BAR that represents the current device memory region
      * and compare it against all the currently active BAR memory regions. If
      * an overlap is found, subtract it from the region to be mapped/unmapped.
      *
-     * First fill the rangeset with all the BARs of this device or with the ROM
+     * First fill the rangesets with all the BARs of this device or with the ROM
      * BAR only, depending on whether the guest is toggling the memory decode
      * bit of the command register, or the enable bit of the ROM BAR register.
      */
     for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
     {
-        const struct vpci_bar *bar = &header->bars[i];
+        struct vpci_bar *bar = &header->bars[i];
         unsigned long start = PFN_DOWN(bar->addr);
         unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
 
+        ASSERT(bar->mem);
+
         if ( !MAPPABLE_BAR(bar) ||
              (rom_only ? bar->type != VPCI_BAR_ROM
                        : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
             continue;
 
-        rc = rangeset_add_range(mem, start, end);
+        rc = rangeset_add_range(bar->mem, start, end);
         if ( rc )
         {
             printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n",
                    start, end, rc);
-            rangeset_destroy(mem);
-            return rc;
+            goto fail;
         }
     }
 
@@ -263,14 +303,21 @@  static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
         unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
                                      vmsix_table_size(pdev->vpci, i) - 1);
 
-        rc = rangeset_remove_range(mem, start, end);
-        if ( rc )
+        for ( j = 0; j < ARRAY_SIZE(header->bars); j++ )
         {
-            printk(XENLOG_G_WARNING
-                   "Failed to remove MSIX table [%lx, %lx]: %d\n",
-                   start, end, rc);
-            rangeset_destroy(mem);
-            return rc;
+            const struct vpci_bar *bar = &header->bars[j];
+
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
+
+            rc = rangeset_remove_range(bar->mem, start, end);
+            if ( rc )
+            {
+                printk(XENLOG_G_WARNING
+                       "Failed to remove MSIX table [%lx, %lx]: %d\n",
+                       start, end, rc);
+                goto fail;
+            }
         }
     }
 
@@ -302,7 +349,8 @@  static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
             unsigned long start = PFN_DOWN(bar->addr);
             unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
 
-            if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) ||
+            if ( !bar->enabled ||
+                 !rangeset_overlaps_range(bar->mem, start, end) ||
                  /*
                   * If only the ROM enable bit is toggled check against other
                   * BARs in the same device for overlaps, but not against the
@@ -311,13 +359,12 @@  static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
                  (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
                 continue;
 
-            rc = rangeset_remove_range(mem, start, end);
+            rc = rangeset_remove_range(bar->mem, start, end);
             if ( rc )
             {
                 printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
                        start, end, rc);
-                rangeset_destroy(mem);
-                return rc;
+                goto fail;
             }
         }
     }
@@ -335,12 +382,35 @@  static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
          * will always be to establish mappings and process all the BARs.
          */
         ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only);
-        return apply_map(pdev->domain, pdev, mem, cmd);
+        return apply_map(pdev->domain, pdev, cmd);
     }
 
-    defer_map(dev->domain, dev, mem, cmd, rom_only);
+    /* Find out how many memory ranges has left after MSI and overlaps. */
+    map_pending = false;
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+        if ( !rangeset_is_empty(header->bars[i].mem) )
+        {
+            map_pending = true;
+            break;
+        }
+
+    /*
+     * There are cases when PCI device, root port for example, has neither
+     * memory space nor IO. In this case PCI command register write is
+     * missed resulting in the underlying PCI device not functional, so:
+     *   - if there are no regions write the command register now
+     *   - if there are regions then defer work and write later on
+     */
+    if ( !map_pending )
+        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
+    else
+        defer_map(dev->domain, dev, cmd, rom_only);
 
     return 0;
+
+fail:
+    vpci_cancel_pending(pdev);
+    return rc;
 }
 
 static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 5f086398a98c..45733300f00b 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -55,7 +55,12 @@  void vpci_remove_device_handlers(const struct pci_dev *pdev)
 
 void vpci_remove_device(struct pci_dev *pdev)
 {
+    struct vpci_header *header = &pdev->vpci->header;
+    unsigned int i;
+
     vpci_cancel_pending(pdev);
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+        rangeset_destroy(header->bars[i].mem);
     vpci_remove_device_handlers(pdev);
     xfree(pdev->vpci->msix);
     xfree(pdev->vpci->msi);
@@ -80,6 +85,8 @@  static int run_vpci_init(struct pci_dev *pdev)
 
 int vpci_add_handlers(struct pci_dev *pdev)
 {
+    struct vpci_header *header;
+    unsigned int i;
     int rc;
 
     if ( !has_vpci(pdev->domain) )
@@ -95,10 +102,25 @@  int vpci_add_handlers(struct pci_dev *pdev)
     INIT_LIST_HEAD(&pdev->vpci->handlers);
     spin_lock_init(&pdev->vpci->lock);
 
+    header = &pdev->vpci->header;
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+    {
+        struct vpci_bar *bar = &header->bars[i];
+
+        bar->mem = rangeset_new(NULL, NULL, 0);
+        if ( !bar->mem )
+        {
+            rc = -ENOMEM;
+            goto fail;
+        }
+    }
+
     rc = run_vpci_init(pdev);
-    if ( rc )
-        vpci_remove_device(pdev);
+    if ( !rc )
+        return 0;
 
+ fail:
+    vpci_remove_device(pdev);
     return rc;
 }
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 3e7428da822c..143f3166a730 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -75,6 +75,7 @@  struct vpci {
             /* Guest view of the BAR. */
             uint64_t guest_addr;
             uint64_t size;
+            struct rangeset *mem;
             enum {
                 VPCI_BAR_EMPTY,
                 VPCI_BAR_IO,
@@ -149,9 +150,9 @@  struct vpci {
 
 struct vpci_vcpu {
     /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */
-    struct rangeset *mem;
     struct pci_dev *pdev;
     uint16_t cmd;
+    bool map_pending : 1;
     bool rom_only : 1;
 };