diff mbox series

[v11,07/17] vpci/header: implement guest BAR register handlers

Message ID 20231202012556.2012281-8-volodymyr_babchuk@epam.com (mailing list archive)
State New, archived
Headers show
Series PCI devices passthrough on Arm, part 3 | expand

Commit Message

Volodymyr Babchuk Dec. 2, 2023, 1:27 a.m. UTC
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Add relevant vpci register handlers when assigning PCI device to a domain
and remove those when de-assigning. This allows having different
handlers for different domains, e.g. hwdom and other guests.

Emulate guest BAR register values: this allows creating a guest view
of the registers and emulates size and properties probe as it is done
during PCI device enumeration by the guest.

All empty, IO and ROM BARs for guests are emulated by returning 0 on
reads and ignoring writes: this BARs are special with this respect as
their lower bits have special meaning, so returning default ~0 on read
may confuse guest OS.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
In v11:
- Access guest_addr after adjusting for MEM64_HI bar in
guest_bar_write()
- guest bar handlers renamed and now  _mem_ part to denote
that they are handling only memory BARs
- refuse to update guest BAR address if BAR is enabled
In v10:
- ull -> ULL to be MISRA-compatbile
- Use PAGE_OFFSET() instead of combining with ~PAGE_MASK
- Set type of empty bars to VPCI_BAR_EMPTY
In v9:
- factored-out "fail" label introduction in init_bars()
- replaced #ifdef CONFIG_X86 with IS_ENABLED()
- do not pass bars[i] to empty_bar_read() handler
- store guest's BAR address instead of guests BAR register view
Since v6:
- unify the writing of the PCI_COMMAND register on the
  error path into a label
- do not introduce bar_ignore_access helper and open code
- s/guest_bar_ignore_read/empty_bar_read
- update error message in guest_bar_write
- only setup empty_bar_read for IO if !x86
Since v5:
- make sure that the guest set address has the same page offset
  as the physical address on the host
- remove guest_rom_{read|write} as those just implement the default
  behaviour of the registers not being handled
- adjusted comment for struct vpci.addr field
- add guest handlers for BARs which are not handled and will otherwise
  return ~0 on read and ignore writes. The BARs are special with this
  respect as their lower bits have special meaning, so returning ~0
  doesn't seem to be right
Since v4:
- updated commit message
- s/guest_addr/guest_reg
Since v3:
- squashed two patches: dynamic add/remove handlers and guest BAR
  handler implementation
- fix guest BAR read of the high part of a 64bit BAR (Roger)
- add error handling to vpci_assign_device
- s/dom%pd/%pd
- blank line before return
Since v2:
- remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
  has been eliminated from being built on x86
Since v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - simplify some code3. simplify
 - use gdprintk + error code instead of gprintk
 - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
   so these do not get compiled for x86
 - removed unneeded is_system_domain check
 - re-work guest read/write to be much simpler and do more work on write
   than read which is expected to be called more frequently
 - removed one too obvious comment
---
 xen/drivers/vpci/header.c | 135 +++++++++++++++++++++++++++++++++-----
 xen/include/xen/vpci.h    |   3 +
 2 files changed, 122 insertions(+), 16 deletions(-)

Comments

Roger Pau Monne Dec. 21, 2023, 3:43 p.m. UTC | #1
On Sat, Dec 02, 2023 at 01:27:04AM +0000, Volodymyr Babchuk wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Add relevant vpci register handlers when assigning PCI device to a domain
> and remove those when de-assigning. This allows having different
> handlers for different domains, e.g. hwdom and other guests.
> 
> Emulate guest BAR register values: this allows creating a guest view
> of the registers and emulates size and properties probe as it is done
> during PCI device enumeration by the guest.
> 
> All empty, IO and ROM BARs for guests are emulated by returning 0 on
> reads and ignoring writes: this BARs are special with this respect as
> their lower bits have special meaning, so returning default ~0 on read
> may confuse guest OS.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>

Just a couple of nits.

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> In v11:
> - Access guest_addr after adjusting for MEM64_HI bar in
> guest_bar_write()
> - guest bar handlers renamed and now  _mem_ part to denote
> that they are handling only memory BARs
> - refuse to update guest BAR address if BAR is enabled
> In v10:
> - ull -> ULL to be MISRA-compatbile
> - Use PAGE_OFFSET() instead of combining with ~PAGE_MASK
> - Set type of empty bars to VPCI_BAR_EMPTY
> In v9:
> - factored-out "fail" label introduction in init_bars()
> - replaced #ifdef CONFIG_X86 with IS_ENABLED()
> - do not pass bars[i] to empty_bar_read() handler
> - store guest's BAR address instead of guests BAR register view
> Since v6:
> - unify the writing of the PCI_COMMAND register on the
>   error path into a label
> - do not introduce bar_ignore_access helper and open code
> - s/guest_bar_ignore_read/empty_bar_read
> - update error message in guest_bar_write
> - only setup empty_bar_read for IO if !x86
> Since v5:
> - make sure that the guest set address has the same page offset
>   as the physical address on the host
> - remove guest_rom_{read|write} as those just implement the default
>   behaviour of the registers not being handled
> - adjusted comment for struct vpci.addr field
> - add guest handlers for BARs which are not handled and will otherwise
>   return ~0 on read and ignore writes. The BARs are special with this
>   respect as their lower bits have special meaning, so returning ~0
>   doesn't seem to be right
> Since v4:
> - updated commit message
> - s/guest_addr/guest_reg
> Since v3:
> - squashed two patches: dynamic add/remove handlers and guest BAR
>   handler implementation
> - fix guest BAR read of the high part of a 64bit BAR (Roger)
> - add error handling to vpci_assign_device
> - s/dom%pd/%pd
> - blank line before return
> Since v2:
> - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
>   has been eliminated from being built on x86
> Since v1:
>  - constify struct pci_dev where possible
>  - do not open code is_system_domain()
>  - simplify some code3. simplify
>  - use gdprintk + error code instead of gprintk
>  - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
>    so these do not get compiled for x86
>  - removed unneeded is_system_domain check
>  - re-work guest read/write to be much simpler and do more work on write
>    than read which is expected to be called more frequently
>  - removed one too obvious comment
> ---
>  xen/drivers/vpci/header.c | 135 +++++++++++++++++++++++++++++++++-----
>  xen/include/xen/vpci.h    |   3 +
>  2 files changed, 122 insertions(+), 16 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index e6a1d58c42..43216429d9 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -477,6 +477,75 @@ static void cf_check bar_write(
>      pci_conf_write32(pdev->sbdf, reg, val);
>  }
>  
> +static void cf_check guest_mem_bar_write(const struct pci_dev *pdev,
> +                                         unsigned int reg, uint32_t val,
> +                                         void *data)
> +{
> +    struct vpci_bar *bar = data;
> +    bool hi = false;
> +    uint64_t guest_addr;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
> +        bar--;
> +        hi = true;
> +    }
> +    else
> +    {
> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
> +    }
> +
> +    guest_addr = bar->guest_addr;
> +    guest_addr &= ~(0xffffffffULL << (hi ? 32 : 0));
> +    guest_addr |= (uint64_t)val << (hi ? 32 : 0);
> +
> +    /* Allow guest to size BAR correctly */
> +    guest_addr &= ~(bar->size - 1);
> +
> +    /*
> +     * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
> +     * writes as long as the BAR is not mapped into the p2m.
> +     */
> +    if ( bar->enabled )
> +    {
> +        /* If the value written is the current one avoid printing a warning. */
> +        if ( guest_addr != bar->guest_addr )
> +            gprintk(XENLOG_WARNING,
> +                    "%pp: ignored guest BAR %zu write while mapped\n",
> +                    &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
> +        return;
> +    }
> +    bar->guest_addr = guest_addr;
> +}
> +
> +static uint32_t cf_check guest_mem_bar_read(const struct pci_dev *pdev,
> +                                            unsigned int reg, void *data)
> +{
> +    const struct vpci_bar *bar = data;
> +    uint32_t reg_val;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
> +        bar--;
> +        return bar->guest_addr >> 32;
> +    }
> +
> +    reg_val = bar->guest_addr;
> +    reg_val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32 :
> +                                             PCI_BASE_ADDRESS_MEM_TYPE_64;
> +    reg_val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
> +
> +    return reg_val;
> +}
> +
> +static uint32_t cf_check empty_bar_read(const struct pci_dev *pdev,
> +                                        unsigned int reg, void *data)
> +{
> +    return 0;
> +}
> +
>  static void cf_check rom_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>  {
> @@ -537,6 +606,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
>      struct vpci_header *header = &pdev->vpci->header;
>      struct vpci_bar *bars = header->bars;
>      int rc;
> +    bool is_hwdom = is_hardware_domain(pdev->domain);
>  
>      ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>  
> @@ -578,8 +648,11 @@ static int cf_check init_bars(struct pci_dev *pdev)
>          if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
>          {
>              bars[i].type = VPCI_BAR_MEM64_HI;
> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
> -                                   4, &bars[i]);
> +            rc = vpci_add_register(pdev->vpci,
> +                                   is_hwdom ? vpci_hw_read32 :
> +                                                        guest_mem_bar_read,

Nit: I would usually indent this as:

is_hwdom ? vpci_hw_read32
         : guest_mem_bar_read,

> +                                   is_hwdom ? bar_write : guest_mem_bar_write,
> +                                   reg, 4, &bars[i]);
>              if ( rc )
>                  goto fail;
>  
> @@ -590,6 +663,14 @@ static int cf_check init_bars(struct pci_dev *pdev)
>          if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>          {
>              bars[i].type = VPCI_BAR_IO;
> +            if ( !IS_ENABLED(CONFIG_X86) && !is_hwdom )
> +            {
> +                rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
> +                                       reg, 4, NULL);
> +                if ( rc )
> +                    goto fail;
> +            }
> +
>              continue;
>          }
>          if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
> @@ -606,6 +687,15 @@ static int cf_check init_bars(struct pci_dev *pdev)
>          if ( size == 0 )
>          {
>              bars[i].type = VPCI_BAR_EMPTY;
> +
> +            if ( !is_hwdom )
> +            {
> +                rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
> +                                       reg, 4, NULL);
> +                if ( rc )
> +                    goto fail;
> +            }
> +
>              continue;
>          }
>  
> @@ -613,28 +703,41 @@ static int cf_check init_bars(struct pci_dev *pdev)
>          bars[i].size = size;
>          bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
>  
> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
> -                               &bars[i]);
> +        rc = vpci_add_register(pdev->vpci,
> +                               is_hwdom ? vpci_hw_read32 : guest_mem_bar_read,
> +                               is_hwdom ? bar_write : guest_mem_bar_write,
> +                               reg, 4, &bars[i]);
>          if ( rc )
>              goto fail;
>      }
>  
> -    /* Check expansion ROM. */
> -    rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);

Nit: I guess you could do something like:

rc = is_hwdom ? pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM)
              : 0;

And that would avoid having to re-indent the whole block?

You could still place the domU code on an else ( !is_hwdom ) branch.

Overall I'm not sure what I prefer, as the re-indentation would be
better done in a non-functional change IMO.

Thanks, Roger.
Stewart Hildebrand Jan. 2, 2024, 9:09 p.m. UTC | #2
On 12/21/23 10:43, Roger Pau Monné wrote:
> On Sat, Dec 02, 2023 at 01:27:04AM +0000, Volodymyr Babchuk wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Add relevant vpci register handlers when assigning PCI device to a domain
>> and remove those when de-assigning. This allows having different
>> handlers for different domains, e.g. hwdom and other guests.
>>
>> Emulate guest BAR register values: this allows creating a guest view
>> of the registers and emulates size and properties probe as it is done
>> during PCI device enumeration by the guest.
>>
>> All empty, IO and ROM BARs for guests are emulated by returning 0 on
>> reads and ignoring writes: this BARs are special with this respect as
>> their lower bits have special meaning, so returning default ~0 on read
>> may confuse guest OS.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> 
> Just a couple of nits.
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks!

> 
>> ---
>> In v11:
>> - Access guest_addr after adjusting for MEM64_HI bar in
>> guest_bar_write()
>> - guest bar handlers renamed and now  _mem_ part to denote
>> that they are handling only memory BARs
>> - refuse to update guest BAR address if BAR is enabled
>> In v10:
>> - ull -> ULL to be MISRA-compatbile
>> - Use PAGE_OFFSET() instead of combining with ~PAGE_MASK
>> - Set type of empty bars to VPCI_BAR_EMPTY
>> In v9:
>> - factored-out "fail" label introduction in init_bars()
>> - replaced #ifdef CONFIG_X86 with IS_ENABLED()
>> - do not pass bars[i] to empty_bar_read() handler
>> - store guest's BAR address instead of guests BAR register view
>> Since v6:
>> - unify the writing of the PCI_COMMAND register on the
>>   error path into a label
>> - do not introduce bar_ignore_access helper and open code
>> - s/guest_bar_ignore_read/empty_bar_read
>> - update error message in guest_bar_write
>> - only setup empty_bar_read for IO if !x86
>> Since v5:
>> - make sure that the guest set address has the same page offset
>>   as the physical address on the host
>> - remove guest_rom_{read|write} as those just implement the default
>>   behaviour of the registers not being handled
>> - adjusted comment for struct vpci.addr field
>> - add guest handlers for BARs which are not handled and will otherwise
>>   return ~0 on read and ignore writes. The BARs are special with this
>>   respect as their lower bits have special meaning, so returning ~0
>>   doesn't seem to be right
>> Since v4:
>> - updated commit message
>> - s/guest_addr/guest_reg
>> Since v3:
>> - squashed two patches: dynamic add/remove handlers and guest BAR
>>   handler implementation
>> - fix guest BAR read of the high part of a 64bit BAR (Roger)
>> - add error handling to vpci_assign_device
>> - s/dom%pd/%pd
>> - blank line before return
>> Since v2:
>> - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
>>   has been eliminated from being built on x86
>> Since v1:
>>  - constify struct pci_dev where possible
>>  - do not open code is_system_domain()
>>  - simplify some code3. simplify
>>  - use gdprintk + error code instead of gprintk
>>  - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
>>    so these do not get compiled for x86
>>  - removed unneeded is_system_domain check
>>  - re-work guest read/write to be much simpler and do more work on write
>>    than read which is expected to be called more frequently
>>  - removed one too obvious comment
>> ---
>>  xen/drivers/vpci/header.c | 135 +++++++++++++++++++++++++++++++++-----
>>  xen/include/xen/vpci.h    |   3 +
>>  2 files changed, 122 insertions(+), 16 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index e6a1d58c42..43216429d9 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -477,6 +477,75 @@ static void cf_check bar_write(
>>      pci_conf_write32(pdev->sbdf, reg, val);
>>  }
>>  
>> +static void cf_check guest_mem_bar_write(const struct pci_dev *pdev,
>> +                                         unsigned int reg, uint32_t val,
>> +                                         void *data)
>> +{
>> +    struct vpci_bar *bar = data;
>> +    bool hi = false;
>> +    uint64_t guest_addr;
>> +
>> +    if ( bar->type == VPCI_BAR_MEM64_HI )
>> +    {
>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>> +        bar--;
>> +        hi = true;
>> +    }
>> +    else
>> +    {
>> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
>> +    }
>> +
>> +    guest_addr = bar->guest_addr;
>> +    guest_addr &= ~(0xffffffffULL << (hi ? 32 : 0));
>> +    guest_addr |= (uint64_t)val << (hi ? 32 : 0);
>> +
>> +    /* Allow guest to size BAR correctly */
>> +    guest_addr &= ~(bar->size - 1);
>> +
>> +    /*
>> +     * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
>> +     * writes as long as the BAR is not mapped into the p2m.
>> +     */
>> +    if ( bar->enabled )
>> +    {
>> +        /* If the value written is the current one avoid printing a warning. */
>> +        if ( guest_addr != bar->guest_addr )
>> +            gprintk(XENLOG_WARNING,
>> +                    "%pp: ignored guest BAR %zu write while mapped\n",
>> +                    &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
>> +        return;
>> +    }
>> +    bar->guest_addr = guest_addr;
>> +}
>> +
>> +static uint32_t cf_check guest_mem_bar_read(const struct pci_dev *pdev,
>> +                                            unsigned int reg, void *data)
>> +{
>> +    const struct vpci_bar *bar = data;
>> +    uint32_t reg_val;
>> +
>> +    if ( bar->type == VPCI_BAR_MEM64_HI )
>> +    {
>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>> +        bar--;
>> +        return bar->guest_addr >> 32;
>> +    }
>> +
>> +    reg_val = bar->guest_addr;
>> +    reg_val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32 :
>> +                                             PCI_BASE_ADDRESS_MEM_TYPE_64;
>> +    reg_val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
>> +
>> +    return reg_val;
>> +}
>> +
>> +static uint32_t cf_check empty_bar_read(const struct pci_dev *pdev,
>> +                                        unsigned int reg, void *data)
>> +{
>> +    return 0;
>> +}
>> +
>>  static void cf_check rom_write(
>>      const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>>  {
>> @@ -537,6 +606,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>      struct vpci_header *header = &pdev->vpci->header;
>>      struct vpci_bar *bars = header->bars;
>>      int rc;
>> +    bool is_hwdom = is_hardware_domain(pdev->domain);
>>  
>>      ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>>  
>> @@ -578,8 +648,11 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>          if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
>>          {
>>              bars[i].type = VPCI_BAR_MEM64_HI;
>> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
>> -                                   4, &bars[i]);
>> +            rc = vpci_add_register(pdev->vpci,
>> +                                   is_hwdom ? vpci_hw_read32 :
>> +                                                        guest_mem_bar_read,
> 
> Nit: I would usually indent this as:
> 
> is_hwdom ? vpci_hw_read32
>          : guest_mem_bar_read,

Will fix

> 
>> +                                   is_hwdom ? bar_write : guest_mem_bar_write,
>> +                                   reg, 4, &bars[i]);
>>              if ( rc )
>>                  goto fail;
>>  
>> @@ -590,6 +663,14 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>          if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>>          {
>>              bars[i].type = VPCI_BAR_IO;
>> +            if ( !IS_ENABLED(CONFIG_X86) && !is_hwdom )
>> +            {
>> +                rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
>> +                                       reg, 4, NULL);
>> +                if ( rc )
>> +                    goto fail;
>> +            }
>> +
>>              continue;
>>          }
>>          if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
>> @@ -606,6 +687,15 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>          if ( size == 0 )
>>          {
>>              bars[i].type = VPCI_BAR_EMPTY;
>> +
>> +            if ( !is_hwdom )
>> +            {
>> +                rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
>> +                                       reg, 4, NULL);
>> +                if ( rc )
>> +                    goto fail;
>> +            }
>> +
>>              continue;
>>          }
>>  
>> @@ -613,28 +703,41 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>          bars[i].size = size;
>>          bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
>>  
>> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
>> -                               &bars[i]);
>> +        rc = vpci_add_register(pdev->vpci,
>> +                               is_hwdom ? vpci_hw_read32 : guest_mem_bar_read,
>> +                               is_hwdom ? bar_write : guest_mem_bar_write,
>> +                               reg, 4, &bars[i]);
>>          if ( rc )
>>              goto fail;
>>      }
>>  
>> -    /* Check expansion ROM. */
>> -    rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
> 
> Nit: I guess you could do something like:
> 
> rc = is_hwdom ? pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM)
>               : 0;
> 
> And that would avoid having to re-indent the whole block?
> 
> You could still place the domU code on an else ( !is_hwdom ) branch.
> 
> Overall I'm not sure what I prefer, as the re-indentation would be
> better done in a non-functional change IMO.

I'm in favor of a smaller diffstat for now. I'll do as you suggest with the ternary/conditional operator.

> 
> Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index e6a1d58c42..43216429d9 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -477,6 +477,75 @@  static void cf_check bar_write(
     pci_conf_write32(pdev->sbdf, reg, val);
 }
 
+static void cf_check guest_mem_bar_write(const struct pci_dev *pdev,
+                                         unsigned int reg, uint32_t val,
+                                         void *data)
+{
+    struct vpci_bar *bar = data;
+    bool hi = false;
+    uint64_t guest_addr;
+
+    if ( bar->type == VPCI_BAR_MEM64_HI )
+    {
+        ASSERT(reg > PCI_BASE_ADDRESS_0);
+        bar--;
+        hi = true;
+    }
+    else
+    {
+        val &= PCI_BASE_ADDRESS_MEM_MASK;
+    }
+
+    guest_addr = bar->guest_addr;
+    guest_addr &= ~(0xffffffffULL << (hi ? 32 : 0));
+    guest_addr |= (uint64_t)val << (hi ? 32 : 0);
+
+    /* Allow guest to size BAR correctly */
+    guest_addr &= ~(bar->size - 1);
+
+    /*
+     * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
+     * writes as long as the BAR is not mapped into the p2m.
+     */
+    if ( bar->enabled )
+    {
+        /* If the value written is the current one avoid printing a warning. */
+        if ( guest_addr != bar->guest_addr )
+            gprintk(XENLOG_WARNING,
+                    "%pp: ignored guest BAR %zu write while mapped\n",
+                    &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
+        return;
+    }
+    bar->guest_addr = guest_addr;
+}
+
+static uint32_t cf_check guest_mem_bar_read(const struct pci_dev *pdev,
+                                            unsigned int reg, void *data)
+{
+    const struct vpci_bar *bar = data;
+    uint32_t reg_val;
+
+    if ( bar->type == VPCI_BAR_MEM64_HI )
+    {
+        ASSERT(reg > PCI_BASE_ADDRESS_0);
+        bar--;
+        return bar->guest_addr >> 32;
+    }
+
+    reg_val = bar->guest_addr;
+    reg_val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32 :
+                                             PCI_BASE_ADDRESS_MEM_TYPE_64;
+    reg_val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
+
+    return reg_val;
+}
+
+static uint32_t cf_check empty_bar_read(const struct pci_dev *pdev,
+                                        unsigned int reg, void *data)
+{
+    return 0;
+}
+
 static void cf_check rom_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
@@ -537,6 +606,7 @@  static int cf_check init_bars(struct pci_dev *pdev)
     struct vpci_header *header = &pdev->vpci->header;
     struct vpci_bar *bars = header->bars;
     int rc;
+    bool is_hwdom = is_hardware_domain(pdev->domain);
 
     ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
 
@@ -578,8 +648,11 @@  static int cf_check init_bars(struct pci_dev *pdev)
         if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
         {
             bars[i].type = VPCI_BAR_MEM64_HI;
-            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
-                                   4, &bars[i]);
+            rc = vpci_add_register(pdev->vpci,
+                                   is_hwdom ? vpci_hw_read32 :
+                                                        guest_mem_bar_read,
+                                   is_hwdom ? bar_write : guest_mem_bar_write,
+                                   reg, 4, &bars[i]);
             if ( rc )
                 goto fail;
 
@@ -590,6 +663,14 @@  static int cf_check init_bars(struct pci_dev *pdev)
         if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
         {
             bars[i].type = VPCI_BAR_IO;
+            if ( !IS_ENABLED(CONFIG_X86) && !is_hwdom )
+            {
+                rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
+                                       reg, 4, NULL);
+                if ( rc )
+                    goto fail;
+            }
+
             continue;
         }
         if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
@@ -606,6 +687,15 @@  static int cf_check init_bars(struct pci_dev *pdev)
         if ( size == 0 )
         {
             bars[i].type = VPCI_BAR_EMPTY;
+
+            if ( !is_hwdom )
+            {
+                rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
+                                       reg, 4, NULL);
+                if ( rc )
+                    goto fail;
+            }
+
             continue;
         }
 
@@ -613,28 +703,41 @@  static int cf_check init_bars(struct pci_dev *pdev)
         bars[i].size = size;
         bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
 
-        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
-                               &bars[i]);
+        rc = vpci_add_register(pdev->vpci,
+                               is_hwdom ? vpci_hw_read32 : guest_mem_bar_read,
+                               is_hwdom ? bar_write : guest_mem_bar_write,
+                               reg, 4, &bars[i]);
         if ( rc )
             goto fail;
     }
 
-    /* Check expansion ROM. */
-    rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
-    if ( rc > 0 && size )
+    if ( is_hwdom )
     {
-        struct vpci_bar *rom = &header->bars[num_bars];
+        rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
+        if ( rc > 0 && size )
+        {
+            struct vpci_bar *rom = &header->bars[num_bars];
 
-        rom->type = VPCI_BAR_ROM;
-        rom->size = size;
-        rom->addr = addr;
-        header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
-                              PCI_ROM_ADDRESS_ENABLE;
+            rom->type = VPCI_BAR_ROM;
+            rom->size = size;
+            rom->addr = addr;
+            header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
+                                  PCI_ROM_ADDRESS_ENABLE;
 
-        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, rom_reg,
-                               4, rom);
+            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write,
+                                   rom_reg, 4, rom);
+            if ( rc )
+                rom->type = VPCI_BAR_EMPTY;
+        }
+    }
+    else
+    {
+        /* TODO: Check expansion ROM, we do not handle ROM for guests for now */
+        header->bars[num_bars].type = VPCI_BAR_EMPTY;
+        rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
+                               rom_reg, 4, NULL);
         if ( rc )
-            rom->type = VPCI_BAR_EMPTY;
+            goto fail;
     }
 
     return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 75cfb532ee..2028f2151f 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -67,7 +67,10 @@  struct vpci {
     struct vpci_header {
         /* Information about the PCI BARs of this device. */
         struct vpci_bar {
+            /* Physical (host) address. */
             uint64_t addr;
+            /* Guest address. */
+            uint64_t guest_addr;
             uint64_t size;
             enum {
                 VPCI_BAR_EMPTY,