diff mbox series

[v6,06/13] vpci/header: implement guest BAR register handlers

Message ID 20220204063459.680961-7-andr2000@gmail.com (mailing list archive)
State New, archived
Headers show
Series PCI devices passthrough on Arm, part 3 | expand

Commit Message

Oleksandr Andrushchenko Feb. 4, 2022, 6:34 a.m. UTC
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Add relevant vpci register handlers when assigning PCI device to a domain
and remove those when de-assigning. This allows having different
handlers for different domains, e.g. hwdom and other guests.

Emulate guest BAR register values: this allows creating a guest view
of the registers and emulates size and properties probe as it is done
during PCI device enumeration by the guest.

All empty, IO and ROM BARs for guests are emulated by returning 0 on
reads and ignoring writes: this BARs are special with this respect as
their lower bits have special meaning, so returning default ~0 on read
may confuse guest OS.

Memory decoding is initially disabled when used by guests in order to
prevent the BAR being placed on top of a RAM region.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v5:
- make sure that the guest set address has the same page offset
  as the physical address on the host
- remove guest_rom_{read|write} as those just implement the default
  behaviour of the registers not being handled
- adjusted comment for struct vpci.addr field
- add guest handlers for BARs which are not handled and will otherwise
  return ~0 on read and ignore writes. The BARs are special with this
  respect as their lower bits have special meaning, so returning ~0
  doesn't seem to be right
Since v4:
- updated commit message
- s/guest_addr/guest_reg
Since v3:
- squashed two patches: dynamic add/remove handlers and guest BAR
  handler implementation
- fix guest BAR read of the high part of a 64bit BAR (Roger)
- add error handling to vpci_assign_device
- s/dom%pd/%pd
- blank line before return
Since v2:
- remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
  has been eliminated from being built on x86
Since v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - simplify some code3. simplify
 - use gdprintk + error code instead of gprintk
 - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
   so these do not get compiled for x86
 - removed unneeded is_system_domain check
 - re-work guest read/write to be much simpler and do more work on write
   than read which is expected to be called more frequently
 - removed one too obvious comment
---
 xen/drivers/vpci/header.c | 131 +++++++++++++++++++++++++++++++++-----
 xen/include/xen/vpci.h    |   3 +
 2 files changed, 118 insertions(+), 16 deletions(-)

Comments

Jan Beulich Feb. 7, 2022, 5:06 p.m. UTC | #1
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev,
> +                                      unsigned int reg, void *data)
> +{
> +    return 0;
> +}
> +
> +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg,
> +                             struct vpci_bar *bar)
> +{
> +    if ( is_hardware_domain(pdev->domain) )
> +        return 0;
> +
> +    return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL,
> +                             reg, 4, bar);
> +}

For these two functions: I'm not sure "ignore" is an appropriate
term here. unused_bar_read() and unused_bar() maybe? Or,
considering we already have VPCI_BAR_EMPTY, s/unused/empty/ ? I'm
also not sure we really need the is_hardware_domain() check here:
Returning 0 for Dom0 is going to be fine as well; there's no need
to fetch the value from actual hardware. The one exception might
be for devices with buggy BAR behavior ...

> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev)
>          if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>          {
>              bars[i].type = VPCI_BAR_IO;
> +
> +            rc = bar_ignore_access(pdev, reg, &bars[i]);
> +            if ( rc )
> +                return rc;

Elsewhere the command register is restored on error paths.

Jan
Oleksandr Andrushchenko Feb. 8, 2022, 8:06 a.m. UTC | #2
On 07.02.22 19:06, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>> +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev,
>> +                                      unsigned int reg, void *data)
>> +{
>> +    return 0;
>> +}
>> +
>> +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg,
>> +                             struct vpci_bar *bar)
>> +{
>> +    if ( is_hardware_domain(pdev->domain) )
>> +        return 0;
>> +
>> +    return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL,
>> +                             reg, 4, bar);
>> +}
> For these two functions: I'm not sure "ignore" is an appropriate
> term here. unused_bar_read() and unused_bar() maybe? Or,
> considering we already have VPCI_BAR_EMPTY, s/unused/empty/ ? I'm
> also not sure we really need the is_hardware_domain() check here:
> Returning 0 for Dom0 is going to be fine as well; there's no need
> to fetch the value from actual hardware. The one exception might
> be for devices with buggy BAR behavior ...
Well, I think this should be ok, so then
- s/guest_bar_ignore_read/empty_bar_read
- s/bar_ignore_access/empty_bar
- no is_hardware_domain check
>
>> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev)
>>           if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>>           {
>>               bars[i].type = VPCI_BAR_IO;
>> +
>> +            rc = bar_ignore_access(pdev, reg, &bars[i]);
>> +            if ( rc )
>> +                return rc;
> Elsewhere the command register is restored on error paths.
Ok, I will restore
>
> Jan
>
Thank you,
Oleksandr
Jan Beulich Feb. 8, 2022, 9:16 a.m. UTC | #3
On 08.02.2022 09:06, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 19:06, Jan Beulich wrote:
>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>> +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev,
>>> +                                      unsigned int reg, void *data)
>>> +{
>>> +    return 0;
>>> +}
>>> +
>>> +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg,
>>> +                             struct vpci_bar *bar)
>>> +{
>>> +    if ( is_hardware_domain(pdev->domain) )
>>> +        return 0;
>>> +
>>> +    return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL,
>>> +                             reg, 4, bar);
>>> +}
>> For these two functions: I'm not sure "ignore" is an appropriate
>> term here. unused_bar_read() and unused_bar() maybe? Or,
>> considering we already have VPCI_BAR_EMPTY, s/unused/empty/ ? I'm
>> also not sure we really need the is_hardware_domain() check here:
>> Returning 0 for Dom0 is going to be fine as well; there's no need
>> to fetch the value from actual hardware. The one exception might
>> be for devices with buggy BAR behavior ...
> Well, I think this should be ok, so then
> - s/guest_bar_ignore_read/empty_bar_read
> - s/bar_ignore_access/empty_bar

Hmm, seeing it, I don't think empty_bar() is a good function name.
setup_empty_bar() or empty_bar_setup() would make more clear what
the function's purpose is.

> - no is_hardware_domain check

Please wait a little to see whether Roger has any input on this aspect.

Jan
Roger Pau Monné Feb. 8, 2022, 9:25 a.m. UTC | #4
On Fri, Feb 04, 2022 at 08:34:52AM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Add relevant vpci register handlers when assigning PCI device to a domain
> and remove those when de-assigning. This allows having different
> handlers for different domains, e.g. hwdom and other guests.
> 
> Emulate guest BAR register values: this allows creating a guest view
> of the registers and emulates size and properties probe as it is done
> during PCI device enumeration by the guest.
> 
> All empty, IO and ROM BARs for guests are emulated by returning 0 on
> reads and ignoring writes: this BARs are special with this respect as
> their lower bits have special meaning, so returning default ~0 on read
> may confuse guest OS.
> 
> Memory decoding is initially disabled when used by guests in order to
> prevent the BAR being placed on top of a RAM region.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v5:
> - make sure that the guest set address has the same page offset
>   as the physical address on the host
> - remove guest_rom_{read|write} as those just implement the default
>   behaviour of the registers not being handled
> - adjusted comment for struct vpci.addr field
> - add guest handlers for BARs which are not handled and will otherwise
>   return ~0 on read and ignore writes. The BARs are special with this
>   respect as their lower bits have special meaning, so returning ~0
>   doesn't seem to be right
> Since v4:
> - updated commit message
> - s/guest_addr/guest_reg
> Since v3:
> - squashed two patches: dynamic add/remove handlers and guest BAR
>   handler implementation
> - fix guest BAR read of the high part of a 64bit BAR (Roger)
> - add error handling to vpci_assign_device
> - s/dom%pd/%pd
> - blank line before return
> Since v2:
> - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
>   has been eliminated from being built on x86
> Since v1:
>  - constify struct pci_dev where possible
>  - do not open code is_system_domain()
>  - simplify some code3. simplify
>  - use gdprintk + error code instead of gprintk
>  - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
>    so these do not get compiled for x86
>  - removed unneeded is_system_domain check
>  - re-work guest read/write to be much simpler and do more work on write
>    than read which is expected to be called more frequently
>  - removed one too obvious comment
> ---
>  xen/drivers/vpci/header.c | 131 +++++++++++++++++++++++++++++++++-----
>  xen/include/xen/vpci.h    |   3 +
>  2 files changed, 118 insertions(+), 16 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index bd23c0274d48..2620a95ff35b 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -406,6 +406,81 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>      pci_conf_write32(pdev->sbdf, reg, val);
>  }
>  
> +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t val, void *data)
> +{
> +    struct vpci_bar *bar = data;
> +    bool hi = false;
> +    uint64_t guest_reg = bar->guest_reg;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
> +        bar--;
> +        hi = true;
> +    }
> +    else
> +    {
> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
> +        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
> +                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
> +        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
> +    }
> +
> +    guest_reg &= ~(0xffffffffull << (hi ? 32 : 0));
> +    guest_reg |= (uint64_t)val << (hi ? 32 : 0);
> +
> +    guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
> +
> +    /*
> +     * Make sure that the guest set address has the same page offset
> +     * as the physical address on the host or otherwise things won't work as
> +     * expected.
> +     */
> +    if ( (guest_reg & (~PAGE_MASK & PCI_BASE_ADDRESS_MEM_MASK)) !=
> +         (bar->addr & ~PAGE_MASK) )

This is only required when !hi, but I'm fine with doing it
unconditionally as it's clearer.

> +    {
> +        gprintk(XENLOG_WARNING,
> +                "%pp: ignored BAR %zu write with wrong page offset\n",

"%pp: ignored BAR %zu write attempting to change page offset\n"

> +                &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
> +        return;
> +    }
> +
> +    bar->guest_reg = guest_reg;
> +}
> +
> +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
> +                               void *data)
> +{
> +    const struct vpci_bar *bar = data;
> +    bool hi = false;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
> +        bar--;
> +        hi = true;
> +    }
> +
> +    return bar->guest_reg >> (hi ? 32 : 0);
> +}
> +
> +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev,
> +                                      unsigned int reg, void *data)
> +{
> +    return 0;
> +}
> +
> +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg,
> +                             struct vpci_bar *bar)
> +{
> +    if ( is_hardware_domain(pdev->domain) )
> +        return 0;
> +
> +    return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL,
> +                             reg, 4, bar);
> +}
> +
>  static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>                        uint32_t val, void *data)
>  {
> @@ -462,6 +537,7 @@ static int init_bars(struct pci_dev *pdev)
>      struct vpci_header *header = &pdev->vpci->header;
>      struct vpci_bar *bars = header->bars;
>      int rc;
> +    bool is_hwdom = is_hardware_domain(pdev->domain);
>  
>      switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>      {
> @@ -501,8 +577,10 @@ static int init_bars(struct pci_dev *pdev)
>          if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
>          {
>              bars[i].type = VPCI_BAR_MEM64_HI;
> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
> -                                   4, &bars[i]);
> +            rc = vpci_add_register(pdev->vpci,
> +                                   is_hwdom ? vpci_hw_read32 : guest_bar_read,
> +                                   is_hwdom ? bar_write : guest_bar_write,
> +                                   reg, 4, &bars[i]);
>              if ( rc )
>              {
>                  pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev)
>          if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>          {
>              bars[i].type = VPCI_BAR_IO;
> +
> +            rc = bar_ignore_access(pdev, reg, &bars[i]);

This is wrong: you only want to ignore access to IO BARs for Arm, for
x86 we should keep the previous behavior. Even more if you go with
Jan's suggestions to make bar_ignore_access also applicable to dom0.

> +            if ( rc )
> +                return rc;
> +
>              continue;
>          }
>          if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
> @@ -535,6 +618,11 @@ static int init_bars(struct pci_dev *pdev)
>          if ( size == 0 )
>          {
>              bars[i].type = VPCI_BAR_EMPTY;
> +
> +            rc = bar_ignore_access(pdev, reg, &bars[i]);
> +            if ( rc )
> +                return rc;

I would be fine to just call vpci_add_register here, ie;

if ( !is_hwdom )
{
    rc = vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL,
                           reg, 4, &bars[i]);
     if ( rc )
     {
         ...
     }
}

Feel free to unify the writing of the PCI_COMMAND register on the
error path into a label, as then the error case would simply be a
`goto error;`

Thanks, Roger.
Roger Pau Monné Feb. 8, 2022, 9:29 a.m. UTC | #5
On Tue, Feb 08, 2022 at 10:16:59AM +0100, Jan Beulich wrote:
> On 08.02.2022 09:06, Oleksandr Andrushchenko wrote:
> > 
> > 
> > On 07.02.22 19:06, Jan Beulich wrote:
> >> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> >>> +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev,
> >>> +                                      unsigned int reg, void *data)
> >>> +{
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg,
> >>> +                             struct vpci_bar *bar)
> >>> +{
> >>> +    if ( is_hardware_domain(pdev->domain) )
> >>> +        return 0;
> >>> +
> >>> +    return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL,
> >>> +                             reg, 4, bar);
> >>> +}
> >> For these two functions: I'm not sure "ignore" is an appropriate
> >> term here. unused_bar_read() and unused_bar() maybe? Or,
> >> considering we already have VPCI_BAR_EMPTY, s/unused/empty/ ? I'm
> >> also not sure we really need the is_hardware_domain() check here:
> >> Returning 0 for Dom0 is going to be fine as well; there's no need
> >> to fetch the value from actual hardware. The one exception might
> >> be for devices with buggy BAR behavior ...
> > Well, I think this should be ok, so then
> > - s/guest_bar_ignore_read/empty_bar_read
> > - s/bar_ignore_access/empty_bar
> 
> Hmm, seeing it, I don't think empty_bar() is a good function name.
> setup_empty_bar() or empty_bar_setup() would make more clear what
> the function's purpose is.

I don't think you require an empty_bar_setup helper, the code there is
trivial can be open coded in init_bars directly IMO.

> 
> > - no is_hardware_domain check
> 
> Please wait a little to see whether Roger has any input on this aspect.

I think for the hw domain we should allow access to the BAR even if Xen
has found it empty. Adding the ignore handlers for dom0 shouldn't make
any difference, but we never know whether some quirky hardware could
make use of that.

Thanks, Roger.
Oleksandr Andrushchenko Feb. 8, 2022, 9:31 a.m. UTC | #6
On 08.02.22 11:25, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 08:34:52AM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Add relevant vpci register handlers when assigning PCI device to a domain
>> and remove those when de-assigning. This allows having different
>> handlers for different domains, e.g. hwdom and other guests.
>>
>> Emulate guest BAR register values: this allows creating a guest view
>> of the registers and emulates size and properties probe as it is done
>> during PCI device enumeration by the guest.
>>
>> All empty, IO and ROM BARs for guests are emulated by returning 0 on
>> reads and ignoring writes: this BARs are special with this respect as
>> their lower bits have special meaning, so returning default ~0 on read
>> may confuse guest OS.
>>
>> Memory decoding is initially disabled when used by guests in order to
>> prevent the BAR being placed on top of a RAM region.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Since v5:
>> - make sure that the guest set address has the same page offset
>>    as the physical address on the host
>> - remove guest_rom_{read|write} as those just implement the default
>>    behaviour of the registers not being handled
>> - adjusted comment for struct vpci.addr field
>> - add guest handlers for BARs which are not handled and will otherwise
>>    return ~0 on read and ignore writes. The BARs are special with this
>>    respect as their lower bits have special meaning, so returning ~0
>>    doesn't seem to be right
>> Since v4:
>> - updated commit message
>> - s/guest_addr/guest_reg
>> Since v3:
>> - squashed two patches: dynamic add/remove handlers and guest BAR
>>    handler implementation
>> - fix guest BAR read of the high part of a 64bit BAR (Roger)
>> - add error handling to vpci_assign_device
>> - s/dom%pd/%pd
>> - blank line before return
>> Since v2:
>> - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
>>    has been eliminated from being built on x86
>> Since v1:
>>   - constify struct pci_dev where possible
>>   - do not open code is_system_domain()
>>   - simplify some code3. simplify
>>   - use gdprintk + error code instead of gprintk
>>   - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
>>     so these do not get compiled for x86
>>   - removed unneeded is_system_domain check
>>   - re-work guest read/write to be much simpler and do more work on write
>>     than read which is expected to be called more frequently
>>   - removed one too obvious comment
>> ---
>>   xen/drivers/vpci/header.c | 131 +++++++++++++++++++++++++++++++++-----
>>   xen/include/xen/vpci.h    |   3 +
>>   2 files changed, 118 insertions(+), 16 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index bd23c0274d48..2620a95ff35b 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -406,6 +406,81 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>>       pci_conf_write32(pdev->sbdf, reg, val);
>>   }
>>   
>> +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
>> +                            uint32_t val, void *data)
>> +{
>> +    struct vpci_bar *bar = data;
>> +    bool hi = false;
>> +    uint64_t guest_reg = bar->guest_reg;
>> +
>> +    if ( bar->type == VPCI_BAR_MEM64_HI )
>> +    {
>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>> +        bar--;
>> +        hi = true;
>> +    }
>> +    else
>> +    {
>> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
>> +        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
>> +                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
>> +        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
>> +    }
>> +
>> +    guest_reg &= ~(0xffffffffull << (hi ? 32 : 0));
>> +    guest_reg |= (uint64_t)val << (hi ? 32 : 0);
>> +
>> +    guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
>> +
>> +    /*
>> +     * Make sure that the guest set address has the same page offset
>> +     * as the physical address on the host or otherwise things won't work as
>> +     * expected.
>> +     */
>> +    if ( (guest_reg & (~PAGE_MASK & PCI_BASE_ADDRESS_MEM_MASK)) !=
>> +         (bar->addr & ~PAGE_MASK) )
> This is only required when !hi, but I'm fine with doing it
> unconditionally as it's clearer.
This is correct wrt hi
>
>> +    {
>> +        gprintk(XENLOG_WARNING,
>> +                "%pp: ignored BAR %zu write with wrong page offset\n",
> "%pp: ignored BAR %zu write attempting to change page offset\n"
Ok
>
>> +                &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
>> +        return;
>> +    }
>> +
>> +    bar->guest_reg = guest_reg;
>> +}
>> +
>> +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
>> +                               void *data)
>> +{
>> +    const struct vpci_bar *bar = data;
>> +    bool hi = false;
>> +
>> +    if ( bar->type == VPCI_BAR_MEM64_HI )
>> +    {
>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>> +        bar--;
>> +        hi = true;
>> +    }
>> +
>> +    return bar->guest_reg >> (hi ? 32 : 0);
>> +}
>> +
>> +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev,
>> +                                      unsigned int reg, void *data)
>> +{
>> +    return 0;
>> +}
>> +
>> +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg,
>> +                             struct vpci_bar *bar)
>> +{
>> +    if ( is_hardware_domain(pdev->domain) )
>> +        return 0;
>> +
>> +    return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL,
>> +                             reg, 4, bar);
>> +}
>> +
>>   static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>>                         uint32_t val, void *data)
>>   {
>> @@ -462,6 +537,7 @@ static int init_bars(struct pci_dev *pdev)
>>       struct vpci_header *header = &pdev->vpci->header;
>>       struct vpci_bar *bars = header->bars;
>>       int rc;
>> +    bool is_hwdom = is_hardware_domain(pdev->domain);
>>   
>>       switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>>       {
>> @@ -501,8 +577,10 @@ static int init_bars(struct pci_dev *pdev)
>>           if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
>>           {
>>               bars[i].type = VPCI_BAR_MEM64_HI;
>> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
>> -                                   4, &bars[i]);
>> +            rc = vpci_add_register(pdev->vpci,
>> +                                   is_hwdom ? vpci_hw_read32 : guest_bar_read,
>> +                                   is_hwdom ? bar_write : guest_bar_write,
>> +                                   reg, 4, &bars[i]);
>>               if ( rc )
>>               {
>>                   pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev)
>>           if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>>           {
>>               bars[i].type = VPCI_BAR_IO;
>> +
>> +            rc = bar_ignore_access(pdev, reg, &bars[i]);
> This is wrong: you only want to ignore access to IO BARs for Arm, for
> x86 we should keep the previous behavior. Even more if you go with
> Jan's suggestions to make bar_ignore_access also applicable to dom0.
How do we want this?
#ifdef CONFIG_ARM?
>
>> +            if ( rc )
>> +                return rc;
>> +
>>               continue;
>>           }
>>           if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
>> @@ -535,6 +618,11 @@ static int init_bars(struct pci_dev *pdev)
>>           if ( size == 0 )
>>           {
>>               bars[i].type = VPCI_BAR_EMPTY;
>> +
>> +            rc = bar_ignore_access(pdev, reg, &bars[i]);
>> +            if ( rc )
>> +                return rc;
> I would be fine to just call vpci_add_register here, ie;
>
> if ( !is_hwdom )
> {
>      rc = vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL,
>                             reg, 4, &bars[i]);
>       if ( rc )
>       {
>           ...
>       }
> }
But we have 3 places where we do the same and also handle errors
the same way. I was thinking having a helper will make the code
clearer. Do you want to open code all the uses?
> Feel free to unify the writing of the PCI_COMMAND register on the
> error path into a label, as then the error case would simply be a
> `goto error;`
I was thinking about it. Will it be ok to make this change in this patch
or you want a dedicated one for that?
> Thanks, Roger.
Thank you,
Oleksandr
Jan Beulich Feb. 8, 2022, 9:48 a.m. UTC | #7
On 08.02.2022 10:31, Oleksandr Andrushchenko wrote:
> On 08.02.22 11:25, Roger Pau Monné wrote:
>> On Fri, Feb 04, 2022 at 08:34:52AM +0200, Oleksandr Andrushchenko wrote:
>>> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev)
>>>           if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>>>           {
>>>               bars[i].type = VPCI_BAR_IO;
>>> +
>>> +            rc = bar_ignore_access(pdev, reg, &bars[i]);
>> This is wrong: you only want to ignore access to IO BARs for Arm, for
>> x86 we should keep the previous behavior. Even more if you go with
>> Jan's suggestions to make bar_ignore_access also applicable to dom0.
> How do we want this?
> #ifdef CONFIG_ARM?

Afaic better via a new, dedicated CONFIG_HAVE_* setting, which x86 selects
but Arm doesn't. Unless we have one already, of course ...

Jan
Oleksandr Andrushchenko Feb. 8, 2022, 9:57 a.m. UTC | #8
On 08.02.22 11:48, Jan Beulich wrote:
> On 08.02.2022 10:31, Oleksandr Andrushchenko wrote:
>> On 08.02.22 11:25, Roger Pau Monné wrote:
>>> On Fri, Feb 04, 2022 at 08:34:52AM +0200, Oleksandr Andrushchenko wrote:
>>>> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev)
>>>>            if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>>>>            {
>>>>                bars[i].type = VPCI_BAR_IO;
>>>> +
>>>> +            rc = bar_ignore_access(pdev, reg, &bars[i]);
>>> This is wrong: you only want to ignore access to IO BARs for Arm, for
>>> x86 we should keep the previous behavior. Even more if you go with
>>> Jan's suggestions to make bar_ignore_access also applicable to dom0.
>> How do we want this?
>> #ifdef CONFIG_ARM?
> Afaic better via a new, dedicated CONFIG_HAVE_* setting, which x86 selects
> but Arm doesn't. Unless we have one already, of course ...
Could you please be more specific on the name you see appropriate?
And do you realize that this is going to be a single user of such a
setting?
> Jan
>
Thank you,
Oleksandr
Jan Beulich Feb. 8, 2022, 10:15 a.m. UTC | #9
On 08.02.2022 10:57, Oleksandr Andrushchenko wrote:
> On 08.02.22 11:48, Jan Beulich wrote:
>> On 08.02.2022 10:31, Oleksandr Andrushchenko wrote:
>>> On 08.02.22 11:25, Roger Pau Monné wrote:
>>>> On Fri, Feb 04, 2022 at 08:34:52AM +0200, Oleksandr Andrushchenko wrote:
>>>>> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev)
>>>>>            if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>>>>>            {
>>>>>                bars[i].type = VPCI_BAR_IO;
>>>>> +
>>>>> +            rc = bar_ignore_access(pdev, reg, &bars[i]);
>>>> This is wrong: you only want to ignore access to IO BARs for Arm, for
>>>> x86 we should keep the previous behavior. Even more if you go with
>>>> Jan's suggestions to make bar_ignore_access also applicable to dom0.
>>> How do we want this?
>>> #ifdef CONFIG_ARM?
>> Afaic better via a new, dedicated CONFIG_HAVE_* setting, which x86 selects
>> but Arm doesn't. Unless we have one already, of course ...
> Could you please be more specific on the name you see appropriate?

I'm pretty sure Linux has something similar, so I'd like to ask that
you go look there. I'm sorry to say this a little bluntly, but I'm
really in need of doing something beyond answering your mails (and
in part re-stating the same thing again and again).

> And do you realize that this is going to be a single user of such a
> setting?

Yes, but I'm not sure this is going to remain just a single use.
Furthermore every CONFIG_<arch> is problematic as soon as a new port
is being worked on. If we wanted to go with a CONFIG_<arch> here, imo
it ought to be CONFIG_X86, not CONFIG_ARM, as I/O ports are really an
x86-specific thing (which has propagated into other architectures in
more or less strange ways, but never as truly I/O ports).

Jan
Oleksandr Andrushchenko Feb. 8, 2022, 10:29 a.m. UTC | #10
On 08.02.22 12:15, Jan Beulich wrote:
> On 08.02.2022 10:57, Oleksandr Andrushchenko wrote:
>> On 08.02.22 11:48, Jan Beulich wrote:
>>> On 08.02.2022 10:31, Oleksandr Andrushchenko wrote:
>>>> On 08.02.22 11:25, Roger Pau Monné wrote:
>>>>> On Fri, Feb 04, 2022 at 08:34:52AM +0200, Oleksandr Andrushchenko wrote:
>>>>>> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev)
>>>>>>             if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>>>>>>             {
>>>>>>                 bars[i].type = VPCI_BAR_IO;
>>>>>> +
>>>>>> +            rc = bar_ignore_access(pdev, reg, &bars[i]);
>>>>> This is wrong: you only want to ignore access to IO BARs for Arm, for
>>>>> x86 we should keep the previous behavior. Even more if you go with
>>>>> Jan's suggestions to make bar_ignore_access also applicable to dom0.
>>>> How do we want this?
>>>> #ifdef CONFIG_ARM?
>>> Afaic better via a new, dedicated CONFIG_HAVE_* setting, which x86 selects
>>> but Arm doesn't. Unless we have one already, of course ...
>> Could you please be more specific on the name you see appropriate?
> I'm pretty sure Linux has something similar, so I'd like to ask that
> you go look there.
Not sure, but I can have a look
>   I'm sorry to say this a little bluntly, but I'm
> really in need of doing something beyond answering your mails
Well, if answers were to be a bit more specific and not so general
some time, this could definitely be helpful and save a lot of time trying
to guess what other party has in their mind.
>   (and
> in part re-stating the same thing again and again).
I have no comments on this.
>
>> And do you realize that this is going to be a single user of such a
>> setting?
> Yes, but I'm not sure this is going to remain just a single use.
> Furthermore every CONFIG_<arch> is problematic as soon as a new port
> is being worked on. If we wanted to go with a CONFIG_<arch> here, imo
> it ought to be CONFIG_X86, not CONFIG_ARM, as I/O ports are really an
> x86-specific thing (which has propagated into other architectures in
> more or less strange ways, but never as truly I/O ports).
I am fine using CONFIG_X86
@Roger, are you ok with that?
>
> Jan
>
Thank you,
Oleksandr
Roger Pau Monné Feb. 8, 2022, 1:58 p.m. UTC | #11
On Tue, Feb 08, 2022 at 10:29:22AM +0000, Oleksandr Andrushchenko wrote:
> On 08.02.22 12:15, Jan Beulich wrote:
> > Yes, but I'm not sure this is going to remain just a single use.
> > Furthermore every CONFIG_<arch> is problematic as soon as a new port
> > is being worked on. If we wanted to go with a CONFIG_<arch> here, imo
> > it ought to be CONFIG_X86, not CONFIG_ARM, as I/O ports are really an
> > x86-specific thing (which has propagated into other architectures in
> > more or less strange ways, but never as truly I/O ports).
> I am fine using CONFIG_X86
> @Roger, are you ok with that?

I guess if that's the only instance of having diverging behavior
because of the lack of IO ports I'm fine with using CONFIG_X86.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index bd23c0274d48..2620a95ff35b 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -406,6 +406,81 @@  static void bar_write(const struct pci_dev *pdev, unsigned int reg,
     pci_conf_write32(pdev->sbdf, reg, val);
 }
 
+static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
+                            uint32_t val, void *data)
+{
+    struct vpci_bar *bar = data;
+    bool hi = false;
+    uint64_t guest_reg = bar->guest_reg;
+
+    if ( bar->type == VPCI_BAR_MEM64_HI )
+    {
+        ASSERT(reg > PCI_BASE_ADDRESS_0);
+        bar--;
+        hi = true;
+    }
+    else
+    {
+        val &= PCI_BASE_ADDRESS_MEM_MASK;
+        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
+                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
+        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
+    }
+
+    guest_reg &= ~(0xffffffffull << (hi ? 32 : 0));
+    guest_reg |= (uint64_t)val << (hi ? 32 : 0);
+
+    guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
+
+    /*
+     * Make sure that the guest set address has the same page offset
+     * as the physical address on the host or otherwise things won't work as
+     * expected.
+     */
+    if ( (guest_reg & (~PAGE_MASK & PCI_BASE_ADDRESS_MEM_MASK)) !=
+         (bar->addr & ~PAGE_MASK) )
+    {
+        gprintk(XENLOG_WARNING,
+                "%pp: ignored BAR %zu write with wrong page offset\n",
+                &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
+        return;
+    }
+
+    bar->guest_reg = guest_reg;
+}
+
+static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
+                               void *data)
+{
+    const struct vpci_bar *bar = data;
+    bool hi = false;
+
+    if ( bar->type == VPCI_BAR_MEM64_HI )
+    {
+        ASSERT(reg > PCI_BASE_ADDRESS_0);
+        bar--;
+        hi = true;
+    }
+
+    return bar->guest_reg >> (hi ? 32 : 0);
+}
+
+static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev,
+                                      unsigned int reg, void *data)
+{
+    return 0;
+}
+
+static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg,
+                             struct vpci_bar *bar)
+{
+    if ( is_hardware_domain(pdev->domain) )
+        return 0;
+
+    return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL,
+                             reg, 4, bar);
+}
+
 static void rom_write(const struct pci_dev *pdev, unsigned int reg,
                       uint32_t val, void *data)
 {
@@ -462,6 +537,7 @@  static int init_bars(struct pci_dev *pdev)
     struct vpci_header *header = &pdev->vpci->header;
     struct vpci_bar *bars = header->bars;
     int rc;
+    bool is_hwdom = is_hardware_domain(pdev->domain);
 
     switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
     {
@@ -501,8 +577,10 @@  static int init_bars(struct pci_dev *pdev)
         if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
         {
             bars[i].type = VPCI_BAR_MEM64_HI;
-            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
-                                   4, &bars[i]);
+            rc = vpci_add_register(pdev->vpci,
+                                   is_hwdom ? vpci_hw_read32 : guest_bar_read,
+                                   is_hwdom ? bar_write : guest_bar_write,
+                                   reg, 4, &bars[i]);
             if ( rc )
             {
                 pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
@@ -516,6 +594,11 @@  static int init_bars(struct pci_dev *pdev)
         if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
         {
             bars[i].type = VPCI_BAR_IO;
+
+            rc = bar_ignore_access(pdev, reg, &bars[i]);
+            if ( rc )
+                return rc;
+
             continue;
         }
         if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
@@ -535,6 +618,11 @@  static int init_bars(struct pci_dev *pdev)
         if ( size == 0 )
         {
             bars[i].type = VPCI_BAR_EMPTY;
+
+            rc = bar_ignore_access(pdev, reg, &bars[i]);
+            if ( rc )
+                return rc;
+
             continue;
         }
 
@@ -542,8 +630,10 @@  static int init_bars(struct pci_dev *pdev)
         bars[i].size = size;
         bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
 
-        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
-                               &bars[i]);
+        rc = vpci_add_register(pdev->vpci,
+                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
+                               is_hwdom ? bar_write : guest_bar_write,
+                               reg, 4, &bars[i]);
         if ( rc )
         {
             pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
@@ -551,22 +641,31 @@  static int init_bars(struct pci_dev *pdev)
         }
     }
 
-    /* Check expansion ROM. */
-    rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
-    if ( rc > 0 && size )
+    /* Check expansion ROM: we do not handle ROM for guests. */
+    if ( is_hwdom )
     {
-        struct vpci_bar *rom = &header->bars[num_bars];
+        rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
+        if ( rc > 0 && size )
+        {
+            struct vpci_bar *rom = &header->bars[num_bars];
 
-        rom->type = VPCI_BAR_ROM;
-        rom->size = size;
-        rom->addr = addr;
-        header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
-                              PCI_ROM_ADDRESS_ENABLE;
+            rom->type = VPCI_BAR_ROM;
+            rom->size = size;
+            rom->addr = addr;
+            header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
+                                  PCI_ROM_ADDRESS_ENABLE;
 
-        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, rom_reg,
-                               4, rom);
+            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write,
+                                   rom_reg, 4, rom);
+            if ( rc )
+                rom->type = VPCI_BAR_EMPTY;
+        }
+    }
+    else
+    {
+        rc = bar_ignore_access(pdev, rom_reg, &header->bars[num_bars]);
         if ( rc )
-            rom->type = VPCI_BAR_EMPTY;
+            return rc;
     }
 
     return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 246307e6f5d5..270d22b85653 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -66,7 +66,10 @@  struct vpci {
     struct vpci_header {
         /* Information about the PCI BARs of this device. */
         struct vpci_bar {
+            /* Physical (host) address. */
             uint64_t addr;
+            /* Guest view of the BAR: address and lower bits. */
+            uint64_t guest_reg;
             uint64_t size;
             enum {
                 VPCI_BAR_EMPTY,