diff mbox series

[v7,1/2] xen/vpci: header: status register handler

Message ID 20230913143550.14565-2-stewart.hildebrand@amd.com (mailing list archive)
State New, archived
Headers show
Series vPCI capabilities filtering | expand

Commit Message

Stewart Hildebrand Sept. 13, 2023, 2:35 p.m. UTC
Introduce a handler for the PCI status register, with ability to mask the
capabilities bit. The status register contains RsvdZ bits, read-only bits, and
write-1-to-clear bits, so introduce bitmasks to handle these in vPCI. If a bit
in the bitmask is set, then the special meaning applies:

  rsvdz_mask: read as zero, guest write ignore (write zero to hardware)
  ro_mask: read normal, guest write ignore (preserve on write to hardware)
  rw1c_mask: read normal, write 1 to clear

The RsvdZ naming was borrowed from the PCI Express Base 4.0 specification.

Xen preserves the value of read-only bits on write to hardware, discarding the
guests write value.

The mask_cap_list flag will be set in a follow-on patch.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
v6->v7:
* re-work args passed to vpci_add_register_mask() (called in init_bars())
* also check for overlap of (rsvdz_mask & ro_mask) in add_register()
* slightly adjust masking operation in vpci_write_helper()

v5->v6:
* remove duplicate PCI_STATUS_CAP_LIST in constant definition
* style fixup in constant definitions
* s/res_mask/rsvdz_mask/
* combine a new masking operation into single line
* preserve r/o bits on write
* get rid of status_read. Instead, use rsvdz_mask for conditionally masking
  PCI_STATUS_CAP_LIST bit
* add comment about PCI_STATUS_CAP_LIST and rsvdp behavior
* add sanity checks in add_register
* move mask_cap_list from struct vpci_header to local variable

v4->v5:
* add support for res_mask
* add support for ro_mask (squash ro_mask patch)
* add constants for reserved, read-only, and rw1c masks

v3->v4:
* move mask_cap_list setting to the capabilities patch
* single pci_conf_read16 in status_read
* align mask_cap_list bitfield in struct vpci_header
* change to rw1c bit mask instead of treating whole register as rw1c
* drop subsystem prefix on renamed add_register function

v2->v3:
* new patch
---
 xen/drivers/vpci/header.c  | 16 +++++++++++
 xen/drivers/vpci/vpci.c    | 55 +++++++++++++++++++++++++++++---------
 xen/include/xen/pci_regs.h |  9 +++++++
 xen/include/xen/vpci.h     |  8 ++++++
 4 files changed, 76 insertions(+), 12 deletions(-)

Comments

Jan Beulich Sept. 14, 2023, 11:12 a.m. UTC | #1
On 13.09.2023 16:35, Stewart Hildebrand wrote:
> Introduce a handler for the PCI status register, with ability to mask the
> capabilities bit. The status register contains RsvdZ bits, read-only bits, and
> write-1-to-clear bits, so introduce bitmasks to handle these in vPCI. If a bit
> in the bitmask is set, then the special meaning applies:
> 
>   rsvdz_mask: read as zero, guest write ignore (write zero to hardware)
>   ro_mask: read normal, guest write ignore (preserve on write to hardware)
>   rw1c_mask: read normal, write 1 to clear
> 
> The RsvdZ naming was borrowed from the PCI Express Base 4.0 specification.
> 
> Xen preserves the value of read-only bits on write to hardware, discarding the
> guests write value.
> 
> The mask_cap_list flag will be set in a follow-on patch.
> 
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné Nov. 17, 2023, 12:40 p.m. UTC | #2
On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> Introduce a handler for the PCI status register, with ability to mask the
> capabilities bit. The status register contains RsvdZ bits, read-only bits, and
> write-1-to-clear bits, so introduce bitmasks to handle these in vPCI. If a bit
> in the bitmask is set, then the special meaning applies:
> 
>   rsvdz_mask: read as zero, guest write ignore (write zero to hardware)
>   ro_mask: read normal, guest write ignore (preserve on write to hardware)
>   rw1c_mask: read normal, write 1 to clear
> 
> The RsvdZ naming was borrowed from the PCI Express Base 4.0 specification.
> 
> Xen preserves the value of read-only bits on write to hardware, discarding the
> guests write value.
> 
> The mask_cap_list flag will be set in a follow-on patch.
> 
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> ---
> v6->v7:
> * re-work args passed to vpci_add_register_mask() (called in init_bars())
> * also check for overlap of (rsvdz_mask & ro_mask) in add_register()
> * slightly adjust masking operation in vpci_write_helper()
> 
> v5->v6:
> * remove duplicate PCI_STATUS_CAP_LIST in constant definition
> * style fixup in constant definitions
> * s/res_mask/rsvdz_mask/
> * combine a new masking operation into single line
> * preserve r/o bits on write
> * get rid of status_read. Instead, use rsvdz_mask for conditionally masking
>   PCI_STATUS_CAP_LIST bit
> * add comment about PCI_STATUS_CAP_LIST and rsvdp behavior
> * add sanity checks in add_register
> * move mask_cap_list from struct vpci_header to local variable
> 
> v4->v5:
> * add support for res_mask
> * add support for ro_mask (squash ro_mask patch)
> * add constants for reserved, read-only, and rw1c masks
> 
> v3->v4:
> * move mask_cap_list setting to the capabilities patch
> * single pci_conf_read16 in status_read
> * align mask_cap_list bitfield in struct vpci_header
> * change to rw1c bit mask instead of treating whole register as rw1c
> * drop subsystem prefix on renamed add_register function
> 
> v2->v3:
> * new patch
> ---
>  xen/drivers/vpci/header.c  | 16 +++++++++++
>  xen/drivers/vpci/vpci.c    | 55 +++++++++++++++++++++++++++++---------
>  xen/include/xen/pci_regs.h |  9 +++++++
>  xen/include/xen/vpci.h     |  8 ++++++
>  4 files changed, 76 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 767c1ba718d7..af267b75ac31 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -521,6 +521,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
>      struct vpci_header *header = &pdev->vpci->header;
>      struct vpci_bar *bars = header->bars;
>      int rc;
> +    bool mask_cap_list = false;
>  
>      switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>      {
> @@ -544,6 +545,21 @@ static int cf_check init_bars(struct pci_dev *pdev)
>      if ( rc )
>          return rc;
>  
> +    /*
> +     * Utilize rsvdz_mask to hide PCI_STATUS_CAP_LIST from the guest for now. If
> +     * support for rsvdp (reserved & preserved) is added in the future, the
> +     * rsvdp mask should be used instead.
> +     */
> +    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
> +                                PCI_STATUS, 2, NULL,
> +                                PCI_STATUS_RSVDZ_MASK |
> +                                    (mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
> +                                PCI_STATUS_RO_MASK &
> +                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
> +                                PCI_STATUS_RW1C_MASK);
> +    if ( rc )
> +        return rc;
> +
>      if ( pdev->ignore_bars )
>          return 0;
>  
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 3bec9a4153da..b4cde7db1b3f 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -29,6 +29,9 @@ struct vpci_register {
>      unsigned int offset;
>      void *private;
>      struct list_head node;
> +    uint32_t rsvdz_mask;
> +    uint32_t ro_mask;
> +    uint32_t rw1c_mask;

I understand that we need the rw1c_mask in order to properly merge
values when doing partial writes, but the other fields I'm not sure we
do need them.  RO bits don't care about what's written to them, and
RsvdZ are always read as 0 and written as 0, so the mask shouldn't
affect the merging.

>  };
>  
>  #ifdef __XEN__
> @@ -145,9 +148,16 @@ uint32_t cf_check vpci_hw_read32(
>      return pci_conf_read32(pdev->sbdf, reg);
>  }
>  
> -int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
> -                      vpci_write_t *write_handler, unsigned int offset,
> -                      unsigned int size, void *data)
> +void cf_check vpci_hw_write16(
> +    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
> +{
> +    pci_conf_write16(pdev->sbdf, reg, val);
> +}
> +
> +static int add_register(struct vpci *vpci, vpci_read_t *read_handler,
> +                        vpci_write_t *write_handler, unsigned int offset,
> +                        unsigned int size, void *data, uint32_t rsvdz_mask,
> +                        uint32_t ro_mask, uint32_t rw1c_mask)
>  {
>      struct list_head *prev;
>      struct vpci_register *r;
> @@ -155,7 +165,8 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>      /* Some sanity checks. */
>      if ( (size != 1 && size != 2 && size != 4) ||
>           offset >= PCI_CFG_SPACE_EXP_SIZE || (offset & (size - 1)) ||
> -         (!read_handler && !write_handler) )
> +         (!read_handler && !write_handler) || (rsvdz_mask & ro_mask) ||
> +         (rsvdz_mask & rw1c_mask) || (ro_mask & rw1c_mask) )
>          return -EINVAL;
>  
>      r = xmalloc(struct vpci_register);
> @@ -167,6 +178,9 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>      r->size = size;
>      r->offset = offset;
>      r->private = data;
> +    r->rsvdz_mask = rsvdz_mask & (0xffffffffU >> (32 - 8 * size));
> +    r->ro_mask = ro_mask & (0xffffffffU >> (32 - 8 * size));
> +    r->rw1c_mask = rw1c_mask & (0xffffffffU >> (32 - 8 * size));
>  
>      spin_lock(&vpci->lock);
>  
> @@ -193,6 +207,23 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>      return 0;
>  }
>  
> +int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
> +                      vpci_write_t *write_handler, unsigned int offset,
> +                      unsigned int size, void *data)
> +{
> +    return add_register(vpci, read_handler, write_handler, offset, size, data,
> +                        0, 0, 0);
> +}
> +
> +int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
> +                           vpci_write_t *write_handler, unsigned int offset,
> +                           unsigned int size, void *data, uint32_t rsvdz_mask,
> +                           uint32_t ro_mask, uint32_t rw1c_mask)
> +{
> +    return add_register(vpci, read_handler, write_handler, offset, size, data,
> +                        rsvdz_mask, ro_mask, rw1c_mask);
> +}
> +
>  int vpci_remove_register(struct vpci *vpci, unsigned int offset,
>                           unsigned int size)
>  {
> @@ -376,6 +407,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>          }
>  
>          val = r->read(pdev, r->offset, r->private);
> +        val &= ~r->rsvdz_mask;
>  
>          /* Check if the read is in the middle of a register. */
>          if ( r->offset < emu.offset )
> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>  
>  /*
>   * Perform a maybe partial write to a register.
> - *
> - * Note that this will only work for simple registers, if Xen needs to
> - * trap accesses to rw1c registers (like the status PCI header register)
> - * the logic in vpci_write will have to be expanded in order to correctly
> - * deal with them.
>   */
>  static void vpci_write_helper(const struct pci_dev *pdev,
>                                const struct vpci_register *r, unsigned int size,
>                                unsigned int offset, uint32_t data)
>  {
> +    uint32_t val = 0;
> +
>      ASSERT(size <= r->size);
>  
> -    if ( size != r->size )
> +    if ( (size != r->size) || r->ro_mask )

Hm, I'm not sure this specific handling for read-only bits is
required, software is allowed to write either 0 or 1 to read-only
bits, but the write will be ignored.

>      {
> -        uint32_t val;
> -
>          val = r->read(pdev, r->offset, r->private);
> +        val &= ~r->rw1c_mask;
>          data = merge_result(val, data, size, offset);
>      }
>  
> +    data &= ~(r->rsvdz_mask | r->ro_mask);
> +    data |= val & r->ro_mask;

You cannot apply the register masks directly into the final value, you
need to offset and mask them as necessary, likewise for val, see
what's done in merge_result().

Regardless of the offset issue, I think the usage of val with the
ro_mask is bogus here, see above.

> +
>      r->write(pdev, r->offset, data & (0xffffffffU >> (32 - 8 * r->size)),
>               r->private);
>  }
> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
> index 84b18736a85d..b72131729db6 100644
> --- a/xen/include/xen/pci_regs.h
> +++ b/xen/include/xen/pci_regs.h
> @@ -66,6 +66,15 @@
>  #define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
>  #define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
>  #define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
> +#define  PCI_STATUS_RSVDZ_MASK		0x0006

In my copy of the PCIe 6 spec bit 6 is also RsvdZ, so the mask should
be 0x46.

But I'm unsure we really need this mask.

Thanks, Roger.
Jan Beulich Nov. 17, 2023, 1:23 p.m. UTC | #3
On 17.11.2023 13:40, Roger Pau Monné wrote:
> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -29,6 +29,9 @@ struct vpci_register {
>>      unsigned int offset;
>>      void *private;
>>      struct list_head node;
>> +    uint32_t rsvdz_mask;
>> +    uint32_t ro_mask;
>> +    uint32_t rw1c_mask;
> 
> I understand that we need the rw1c_mask in order to properly merge
> values when doing partial writes, but the other fields I'm not sure we
> do need them.  RO bits don't care about what's written to them, and
> RsvdZ are always read as 0 and written as 0, so the mask shouldn't
> affect the merging.

What some version of the spec says is r/o or reserved may be different
in another. Also iirc devices may (wrongly?) implement r/o bits as r/w.
When presenting a virtual view of devices to guests, in this regard I
think we want (or even need) to enforce our view of the world.

Jan
Roger Pau Monné Nov. 17, 2023, 1:33 p.m. UTC | #4
On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> +int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
> +                           vpci_write_t *write_handler, unsigned int offset,
> +                           unsigned int size, void *data, uint32_t rsvdz_mask,
> +                           uint32_t ro_mask, uint32_t rw1c_mask)

Forgot to mention, it would be good if you can add some tests in
tools/tests/vpci that ensure the masks work as expected.

Thanks, Roger.
Roger Pau Monné Nov. 17, 2023, 1:45 p.m. UTC | #5
On Fri, Nov 17, 2023 at 02:23:42PM +0100, Jan Beulich wrote:
> On 17.11.2023 13:40, Roger Pau Monné wrote:
> > On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> >> --- a/xen/drivers/vpci/vpci.c
> >> +++ b/xen/drivers/vpci/vpci.c
> >> @@ -29,6 +29,9 @@ struct vpci_register {
> >>      unsigned int offset;
> >>      void *private;
> >>      struct list_head node;
> >> +    uint32_t rsvdz_mask;
> >> +    uint32_t ro_mask;
> >> +    uint32_t rw1c_mask;
> > 
> > I understand that we need the rw1c_mask in order to properly merge
> > values when doing partial writes, but the other fields I'm not sure we
> > do need them.  RO bits don't care about what's written to them, and
> > RsvdZ are always read as 0 and written as 0, so the mask shouldn't
> > affect the merging.
> 
> What some version of the spec says is r/o or reserved may be different
> in another. Also iirc devices may (wrongly?) implement r/o bits as r/w.
> When presenting a virtual view of devices to guests, in this regard I
> think we want (or even need) to enforce our view of the world.

That needs to be part of the commit message then.

Ideally we would also want to do a swept of all registers we currently
implement, in order to check for ro or rsvdz bits and properly enforce
them.

Thanks, Roger.
Roger Pau Monné Nov. 17, 2023, 4:45 p.m. UTC | #6
On Fri, Nov 17, 2023 at 01:40:37PM +0100, Roger Pau Monné wrote:
> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> >      {
> > -        uint32_t val;
> > -
> >          val = r->read(pdev, r->offset, r->private);
> > +        val &= ~r->rw1c_mask;
> >          data = merge_result(val, data, size, offset);
> >      }
> >  
> > +    data &= ~(r->rsvdz_mask | r->ro_mask);
> > +    data |= val & r->ro_mask;
> 
> You cannot apply the register masks directly into the final value, you
> need to offset and mask them as necessary, likewise for val, see
> what's done in merge_result().

Never mind, I was wrong, there's no need to offset anything here.

Roger.
Roger Pau Monné Nov. 21, 2023, 2:45 p.m. UTC | #7
On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>  
>  /*
>   * Perform a maybe partial write to a register.
> - *
> - * Note that this will only work for simple registers, if Xen needs to
> - * trap accesses to rw1c registers (like the status PCI header register)
> - * the logic in vpci_write will have to be expanded in order to correctly
> - * deal with them.
>   */
>  static void vpci_write_helper(const struct pci_dev *pdev,
>                                const struct vpci_register *r, unsigned int size,
>                                unsigned int offset, uint32_t data)
>  {
> +    uint32_t val = 0;
> +
>      ASSERT(size <= r->size);
>  
> -    if ( size != r->size )
> +    if ( (size != r->size) || r->ro_mask )
>      {
> -        uint32_t val;
> -
>          val = r->read(pdev, r->offset, r->private);
> +        val &= ~r->rw1c_mask;
>          data = merge_result(val, data, size, offset);
>      }
>  
> +    data &= ~(r->rsvdz_mask | r->ro_mask);
> +    data |= val & r->ro_mask;

I've been thinking about this, and the way the ro_mask is implemented
(and the way we want to handle ro bits) is the same behavior as RsvdP.
I would suggest to rename the ro_mask to rsvdp_mask and note
that for resilience reasons we will handle RO bits as RsvdP.

Thanks, Roger.
Stewart Hildebrand Nov. 21, 2023, 3:03 p.m. UTC | #8
On 11/21/23 09:45, Roger Pau Monné wrote:
> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>  
>>  /*
>>   * Perform a maybe partial write to a register.
>> - *
>> - * Note that this will only work for simple registers, if Xen needs to
>> - * trap accesses to rw1c registers (like the status PCI header register)
>> - * the logic in vpci_write will have to be expanded in order to correctly
>> - * deal with them.
>>   */
>>  static void vpci_write_helper(const struct pci_dev *pdev,
>>                                const struct vpci_register *r, unsigned int size,
>>                                unsigned int offset, uint32_t data)
>>  {
>> +    uint32_t val = 0;
>> +
>>      ASSERT(size <= r->size);
>>  
>> -    if ( size != r->size )
>> +    if ( (size != r->size) || r->ro_mask )
>>      {
>> -        uint32_t val;
>> -
>>          val = r->read(pdev, r->offset, r->private);
>> +        val &= ~r->rw1c_mask;
>>          data = merge_result(val, data, size, offset);
>>      }
>>  
>> +    data &= ~(r->rsvdz_mask | r->ro_mask);
>> +    data |= val & r->ro_mask;
> 
> I've been thinking about this, and the way the ro_mask is implemented
> (and the way we want to handle ro bits) is the same behavior as RsvdP.
> I would suggest to rename the ro_mask to rsvdp_mask and note
> that for resilience reasons we will handle RO bits as RsvdP.

But the reads behave differently. RO should return the value, RsvdP should return 0 when read (according to the PCIe Base Spec 4.0).
Roger Pau Monné Nov. 21, 2023, 3:18 p.m. UTC | #9
On Tue, Nov 21, 2023 at 10:03:01AM -0500, Stewart Hildebrand wrote:
> On 11/21/23 09:45, Roger Pau Monné wrote:
> > On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> >> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
> >>  
> >>  /*
> >>   * Perform a maybe partial write to a register.
> >> - *
> >> - * Note that this will only work for simple registers, if Xen needs to
> >> - * trap accesses to rw1c registers (like the status PCI header register)
> >> - * the logic in vpci_write will have to be expanded in order to correctly
> >> - * deal with them.
> >>   */
> >>  static void vpci_write_helper(const struct pci_dev *pdev,
> >>                                const struct vpci_register *r, unsigned int size,
> >>                                unsigned int offset, uint32_t data)
> >>  {
> >> +    uint32_t val = 0;
> >> +
> >>      ASSERT(size <= r->size);
> >>  
> >> -    if ( size != r->size )
> >> +    if ( (size != r->size) || r->ro_mask )
> >>      {
> >> -        uint32_t val;
> >> -
> >>          val = r->read(pdev, r->offset, r->private);
> >> +        val &= ~r->rw1c_mask;
> >>          data = merge_result(val, data, size, offset);
> >>      }
> >>  
> >> +    data &= ~(r->rsvdz_mask | r->ro_mask);
> >> +    data |= val & r->ro_mask;
> > 
> > I've been thinking about this, and the way the ro_mask is implemented
> > (and the way we want to handle ro bits) is the same behavior as RsvdP.
> > I would suggest to rename the ro_mask to rsvdp_mask and note
> > that for resilience reasons we will handle RO bits as RsvdP.
> 
> But the reads behave differently. RO should return the value, RsvdP should return 0 when read (according to the PCIe Base Spec 4.0).

Hm, right, sorry for the wrong suggestion.  We should force bits to 0
for guests reads, but make sure that for writes the value on the
hardware is preserved.

So we need the separate mask for RsvdP, which will be handled like
ro_mask in vpci_write_helper() and like rsvdz_mask in vpci_read().

Roger.
Stewart Hildebrand Nov. 21, 2023, 4:27 p.m. UTC | #10
On 11/21/23 10:18, Roger Pau Monné wrote:
> On Tue, Nov 21, 2023 at 10:03:01AM -0500, Stewart Hildebrand wrote:
>> On 11/21/23 09:45, Roger Pau Monné wrote:
>>> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>>>> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>>  
>>>>  /*
>>>>   * Perform a maybe partial write to a register.
>>>> - *
>>>> - * Note that this will only work for simple registers, if Xen needs to
>>>> - * trap accesses to rw1c registers (like the status PCI header register)
>>>> - * the logic in vpci_write will have to be expanded in order to correctly
>>>> - * deal with them.
>>>>   */
>>>>  static void vpci_write_helper(const struct pci_dev *pdev,
>>>>                                const struct vpci_register *r, unsigned int size,
>>>>                                unsigned int offset, uint32_t data)
>>>>  {
>>>> +    uint32_t val = 0;
>>>> +
>>>>      ASSERT(size <= r->size);
>>>>  
>>>> -    if ( size != r->size )
>>>> +    if ( (size != r->size) || r->ro_mask )
>>>>      {
>>>> -        uint32_t val;
>>>> -
>>>>          val = r->read(pdev, r->offset, r->private);
>>>> +        val &= ~r->rw1c_mask;
>>>>          data = merge_result(val, data, size, offset);
>>>>      }
>>>>  
>>>> +    data &= ~(r->rsvdz_mask | r->ro_mask);
>>>> +    data |= val & r->ro_mask;
>>>
>>> I've been thinking about this, and the way the ro_mask is implemented
>>> (and the way we want to handle ro bits) is the same behavior as RsvdP.
>>> I would suggest to rename the ro_mask to rsvdp_mask and note
>>> that for resilience reasons we will handle RO bits as RsvdP.
>>
>> But the reads behave differently. RO should return the value, RsvdP should return 0 when read (according to the PCIe Base Spec 4.0).
> 
> Hm, right, sorry for the wrong suggestion.  We should force bits to 0
> for guests reads, but make sure that for writes the value on the
> hardware is preserved.
> 
> So we need the separate mask for RsvdP, which will be handled like
> ro_mask in vpci_write_helper() and like rsvdz_mask in vpci_read().

Agreed. The only reason I didn't add RsvdP support in this patch was that it wasn't needed for the status register... Since RsvdP will be needed for the command register, I think I may as well implement it as part of this series, perhaps in a separate patch following this one. Stay tuned for v8.
Stewart Hildebrand Nov. 21, 2023, 4:46 p.m. UTC | #11
On 11/21/23 11:27, Stewart Hildebrand wrote:
> On 11/21/23 10:18, Roger Pau Monné wrote:
>> On Tue, Nov 21, 2023 at 10:03:01AM -0500, Stewart Hildebrand wrote:
>>> On 11/21/23 09:45, Roger Pau Monné wrote:
>>>> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>>>>> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>>>  
>>>>>  /*
>>>>>   * Perform a maybe partial write to a register.
>>>>> - *
>>>>> - * Note that this will only work for simple registers, if Xen needs to
>>>>> - * trap accesses to rw1c registers (like the status PCI header register)
>>>>> - * the logic in vpci_write will have to be expanded in order to correctly
>>>>> - * deal with them.
>>>>>   */
>>>>>  static void vpci_write_helper(const struct pci_dev *pdev,
>>>>>                                const struct vpci_register *r, unsigned int size,
>>>>>                                unsigned int offset, uint32_t data)
>>>>>  {
>>>>> +    uint32_t val = 0;
>>>>> +
>>>>>      ASSERT(size <= r->size);
>>>>>  
>>>>> -    if ( size != r->size )
>>>>> +    if ( (size != r->size) || r->ro_mask )
>>>>>      {
>>>>> -        uint32_t val;
>>>>> -
>>>>>          val = r->read(pdev, r->offset, r->private);
>>>>> +        val &= ~r->rw1c_mask;
>>>>>          data = merge_result(val, data, size, offset);
>>>>>      }
>>>>>  
>>>>> +    data &= ~(r->rsvdz_mask | r->ro_mask);
>>>>> +    data |= val & r->ro_mask;
>>>>
>>>> I've been thinking about this, and the way the ro_mask is implemented
>>>> (and the way we want to handle ro bits) is the same behavior as RsvdP.
>>>> I would suggest to rename the ro_mask to rsvdp_mask and note
>>>> that for resilience reasons we will handle RO bits as RsvdP.
>>>
>>> But the reads behave differently. RO should return the value, RsvdP should return 0 when read (according to the PCIe Base Spec 4.0).
>>
>> Hm, right, sorry for the wrong suggestion.  We should force bits to 0
>> for guests reads, but make sure that for writes the value on the
>> hardware is preserved.
>>
>> So we need the separate mask for RsvdP, which will be handled like
>> ro_mask in vpci_write_helper() and like rsvdz_mask in vpci_read().
> 
> Agreed. The only reason I didn't add RsvdP support in this patch was that it wasn't needed for the status register... Since RsvdP will be needed for the command register, I think I may as well implement it as part of this series, perhaps in a separate patch following this one. Stay tuned for v8.

I just remembered that RsvdP would actually be useful for the PCI_STATUS_CAP_LIST bit in the status register. So I'll implement it directly in this patch afterall, not a separate one.
Stewart Hildebrand Nov. 22, 2023, 8:16 p.m. UTC | #12
On 11/17/23 07:40, Roger Pau Monné wrote:
> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>> Introduce a handler for the PCI status register, with ability to mask the
>> capabilities bit. The status register contains RsvdZ bits, read-only bits, and
>> write-1-to-clear bits, so introduce bitmasks to handle these in vPCI. If a bit
>> in the bitmask is set, then the special meaning applies:
>>
>>   rsvdz_mask: read as zero, guest write ignore (write zero to hardware)
>>   ro_mask: read normal, guest write ignore (preserve on write to hardware)
>>   rw1c_mask: read normal, write 1 to clear
>>
>> The RsvdZ naming was borrowed from the PCI Express Base 4.0 specification.
>>
>> Xen preserves the value of read-only bits on write to hardware, discarding the
>> guests write value.
>>
>> The mask_cap_list flag will be set in a follow-on patch.
>>
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> ---
>> v6->v7:
>> * re-work args passed to vpci_add_register_mask() (called in init_bars())
>> * also check for overlap of (rsvdz_mask & ro_mask) in add_register()
>> * slightly adjust masking operation in vpci_write_helper()
>>
>> v5->v6:
>> * remove duplicate PCI_STATUS_CAP_LIST in constant definition
>> * style fixup in constant definitions
>> * s/res_mask/rsvdz_mask/
>> * combine a new masking operation into single line
>> * preserve r/o bits on write
>> * get rid of status_read. Instead, use rsvdz_mask for conditionally masking
>>   PCI_STATUS_CAP_LIST bit
>> * add comment about PCI_STATUS_CAP_LIST and rsvdp behavior
>> * add sanity checks in add_register
>> * move mask_cap_list from struct vpci_header to local variable
>>
>> v4->v5:
>> * add support for res_mask
>> * add support for ro_mask (squash ro_mask patch)
>> * add constants for reserved, read-only, and rw1c masks
>>
>> v3->v4:
>> * move mask_cap_list setting to the capabilities patch
>> * single pci_conf_read16 in status_read
>> * align mask_cap_list bitfield in struct vpci_header
>> * change to rw1c bit mask instead of treating whole register as rw1c
>> * drop subsystem prefix on renamed add_register function
>>
>> v2->v3:
>> * new patch
>> ---
>>  xen/drivers/vpci/header.c  | 16 +++++++++++
>>  xen/drivers/vpci/vpci.c    | 55 +++++++++++++++++++++++++++++---------
>>  xen/include/xen/pci_regs.h |  9 +++++++
>>  xen/include/xen/vpci.h     |  8 ++++++
>>  4 files changed, 76 insertions(+), 12 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 767c1ba718d7..af267b75ac31 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -521,6 +521,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>      struct vpci_header *header = &pdev->vpci->header;
>>      struct vpci_bar *bars = header->bars;
>>      int rc;
>> +    bool mask_cap_list = false;
>>  
>>      switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>>      {
>> @@ -544,6 +545,21 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>      if ( rc )
>>          return rc;
>>  
>> +    /*
>> +     * Utilize rsvdz_mask to hide PCI_STATUS_CAP_LIST from the guest for now. If
>> +     * support for rsvdp (reserved & preserved) is added in the future, the
>> +     * rsvdp mask should be used instead.
>> +     */
>> +    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
>> +                                PCI_STATUS, 2, NULL,
>> +                                PCI_STATUS_RSVDZ_MASK |
>> +                                    (mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
>> +                                PCI_STATUS_RO_MASK &
>> +                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
>> +                                PCI_STATUS_RW1C_MASK);
>> +    if ( rc )
>> +        return rc;
>> +
>>      if ( pdev->ignore_bars )
>>          return 0;
>>  
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 3bec9a4153da..b4cde7db1b3f 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -29,6 +29,9 @@ struct vpci_register {
>>      unsigned int offset;
>>      void *private;
>>      struct list_head node;
>> +    uint32_t rsvdz_mask;
>> +    uint32_t ro_mask;
>> +    uint32_t rw1c_mask;
> 
> I understand that we need the rw1c_mask in order to properly merge
> values when doing partial writes, but the other fields I'm not sure we
> do need them.  RO bits don't care about what's written to them, and
> RsvdZ are always read as 0 and written as 0, so the mask shouldn't
> affect the merging.

See discussions at [1] and [2]. I'll keep ro_mask/rsvdz_mask, add rsvdp_mask, and expand the commit message. For another data point, qemu (hw/xen/xen_pt_config_init.c:xen_pt_emu_reg_header0) also has a similar way of masking bits.

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-11/msg01398.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2023-11/msg01693.html

> 
>>  };
>>  
>>  #ifdef __XEN__
>> @@ -145,9 +148,16 @@ uint32_t cf_check vpci_hw_read32(
>>      return pci_conf_read32(pdev->sbdf, reg);
>>  }
>>  
>> -int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>> -                      vpci_write_t *write_handler, unsigned int offset,
>> -                      unsigned int size, void *data)
>> +void cf_check vpci_hw_write16(
>> +    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>> +{
>> +    pci_conf_write16(pdev->sbdf, reg, val);
>> +}
>> +
>> +static int add_register(struct vpci *vpci, vpci_read_t *read_handler,
>> +                        vpci_write_t *write_handler, unsigned int offset,
>> +                        unsigned int size, void *data, uint32_t rsvdz_mask,
>> +                        uint32_t ro_mask, uint32_t rw1c_mask)
>>  {
>>      struct list_head *prev;
>>      struct vpci_register *r;
>> @@ -155,7 +165,8 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>      /* Some sanity checks. */
>>      if ( (size != 1 && size != 2 && size != 4) ||
>>           offset >= PCI_CFG_SPACE_EXP_SIZE || (offset & (size - 1)) ||
>> -         (!read_handler && !write_handler) )
>> +         (!read_handler && !write_handler) || (rsvdz_mask & ro_mask) ||
>> +         (rsvdz_mask & rw1c_mask) || (ro_mask & rw1c_mask) )
>>          return -EINVAL;
>>  
>>      r = xmalloc(struct vpci_register);
>> @@ -167,6 +178,9 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>      r->size = size;
>>      r->offset = offset;
>>      r->private = data;
>> +    r->rsvdz_mask = rsvdz_mask & (0xffffffffU >> (32 - 8 * size));
>> +    r->ro_mask = ro_mask & (0xffffffffU >> (32 - 8 * size));
>> +    r->rw1c_mask = rw1c_mask & (0xffffffffU >> (32 - 8 * size));
>>  
>>      spin_lock(&vpci->lock);
>>  
>> @@ -193,6 +207,23 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>      return 0;
>>  }
>>  
>> +int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>> +                      vpci_write_t *write_handler, unsigned int offset,
>> +                      unsigned int size, void *data)
>> +{
>> +    return add_register(vpci, read_handler, write_handler, offset, size, data,
>> +                        0, 0, 0);
>> +}
>> +
>> +int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
>> +                           vpci_write_t *write_handler, unsigned int offset,
>> +                           unsigned int size, void *data, uint32_t rsvdz_mask,
>> +                           uint32_t ro_mask, uint32_t rw1c_mask)
>> +{
>> +    return add_register(vpci, read_handler, write_handler, offset, size, data,
>> +                        rsvdz_mask, ro_mask, rw1c_mask);
>> +}
>> +
>>  int vpci_remove_register(struct vpci *vpci, unsigned int offset,
>>                           unsigned int size)
>>  {
>> @@ -376,6 +407,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>          }
>>  
>>          val = r->read(pdev, r->offset, r->private);
>> +        val &= ~r->rsvdz_mask;
>>  
>>          /* Check if the read is in the middle of a register. */
>>          if ( r->offset < emu.offset )
>> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>  
>>  /*
>>   * Perform a maybe partial write to a register.
>> - *
>> - * Note that this will only work for simple registers, if Xen needs to
>> - * trap accesses to rw1c registers (like the status PCI header register)
>> - * the logic in vpci_write will have to be expanded in order to correctly
>> - * deal with them.
>>   */
>>  static void vpci_write_helper(const struct pci_dev *pdev,
>>                                const struct vpci_register *r, unsigned int size,
>>                                unsigned int offset, uint32_t data)
>>  {
>> +    uint32_t val = 0;
>> +
>>      ASSERT(size <= r->size);
>>  
>> -    if ( size != r->size )
>> +    if ( (size != r->size) || r->ro_mask )
> 
> Hm, I'm not sure this specific handling for read-only bits is
> required, software is allowed to write either 0 or 1 to read-only
> bits, but the write will be ignored.

See [1].

> 
>>      {
>> -        uint32_t val;
>> -
>>          val = r->read(pdev, r->offset, r->private);
>> +        val &= ~r->rw1c_mask;
>>          data = merge_result(val, data, size, offset);
>>      }
>>  
>> +    data &= ~(r->rsvdz_mask | r->ro_mask);
>> +    data |= val & r->ro_mask;
> 
> You cannot apply the register masks directly into the final value, you
> need to offset and mask them as necessary, likewise for val, see
> what's done in merge_result().

See [3].

[3] https://lists.xenproject.org/archives/html/xen-devel/2023-11/msg01412.html

> 
> Regardless of the offset issue, I think the usage of val with the
> ro_mask is bogus here, see above.

See [1].

> 
>> +
>>      r->write(pdev, r->offset, data & (0xffffffffU >> (32 - 8 * r->size)),
>>               r->private);
>>  }
>> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
>> index 84b18736a85d..b72131729db6 100644
>> --- a/xen/include/xen/pci_regs.h
>> +++ b/xen/include/xen/pci_regs.h
>> @@ -66,6 +66,15 @@
>>  #define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
>>  #define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
>>  #define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
>> +#define  PCI_STATUS_RSVDZ_MASK		0x0006
> 
> In my copy of the PCIe 6 spec bit 6 is also RsvdZ, so the mask should
> be 0x46.

Right, mine too. It's probably safer to follow the newer version of the spec, so I'll make the change. For completeness / archaeology purposes, I just want to document some relevant data points here.

In PCIe 4 spec, it says this about bit 6:
"These bits were used in previous versions of the programming model. Careful consideration should be given to any attempt to repurpose them."

Going further back, PCI (old school PCI, not Express) spec 3.0 says this about bit 6:
"This bit is reserved. *"
"* In Revision 2.1 of this specification, this bit was used to indicate whether or not a device supported User Definable Features."

Just above in our pci_regs.h (and equally in Linux include/uapi/linux/pci_regs.h) we have this definition for bit 6:
#define  PCI_STATUS_UDF         0x40    /* Support User Definable Features [obsolete] */

Qemu hw/xen/xen_pt_config_init.c treats bit 6 as RO:
        .ro_mask    = 0x06F8,

> 
> But I'm unsure we really need this mask.
> 
> Thanks, Roger.
Stewart Hildebrand Nov. 22, 2023, 10:04 p.m. UTC | #13
On 11/17/23 08:33, Roger Pau Monné wrote:
> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>> +int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
>> +                           vpci_write_t *write_handler, unsigned int offset,
>> +                           unsigned int size, void *data, uint32_t rsvdz_mask,
>> +                           uint32_t ro_mask, uint32_t rw1c_mask)
> 
> Forgot to mention, it would be good if you can add some tests in
> tools/tests/vpci that ensure the masks work as expected.

Will do

> 
> Thanks, Roger.
Roger Pau Monné Nov. 23, 2023, 8:14 a.m. UTC | #14
On Wed, Nov 22, 2023 at 03:16:29PM -0500, Stewart Hildebrand wrote:
> On 11/17/23 07:40, Roger Pau Monné wrote:
> > On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> >>      r->write(pdev, r->offset, data & (0xffffffffU >> (32 - 8 * r->size)),
> >>               r->private);
> >>  }
> >> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
> >> index 84b18736a85d..b72131729db6 100644
> >> --- a/xen/include/xen/pci_regs.h
> >> +++ b/xen/include/xen/pci_regs.h
> >> @@ -66,6 +66,15 @@
> >>  #define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
> >>  #define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
> >>  #define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
> >> +#define  PCI_STATUS_RSVDZ_MASK		0x0006
> > 
> > In my copy of the PCIe 6 spec bit 6 is also RsvdZ, so the mask should
> > be 0x46.
> 
> Right, mine too. It's probably safer to follow the newer version of the spec, so I'll make the change. For completeness / archaeology purposes, I just want to document some relevant data points here.
> 
> In PCIe 4 spec, it says this about bit 6:
> "These bits were used in previous versions of the programming model. Careful consideration should be given to any attempt to repurpose them."
> 
> Going further back, PCI (old school PCI, not Express) spec 3.0 says this about bit 6:
> "This bit is reserved. *"
> "* In Revision 2.1 of this specification, this bit was used to indicate whether or not a device supported User Definable Features."
> 
> Just above in our pci_regs.h (and equally in Linux include/uapi/linux/pci_regs.h) we have this definition for bit 6:
> #define  PCI_STATUS_UDF         0x40    /* Support User Definable Features [obsolete] */
> 
> Qemu hw/xen/xen_pt_config_init.c treats bit 6 as RO:
>         .ro_mask    = 0x06F8,

Right, given the implementation of ro_mask that would likely be fine.
Reading unconditionally as 0 while preserving the value on writes
seems the safest option.

Thanks, Roger.
Stewart Hildebrand Nov. 23, 2023, 12:57 p.m. UTC | #15
On 11/23/23 03:14, Roger Pau Monné wrote:
> On Wed, Nov 22, 2023 at 03:16:29PM -0500, Stewart Hildebrand wrote:
>> On 11/17/23 07:40, Roger Pau Monné wrote:
>>> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>>>>      r->write(pdev, r->offset, data & (0xffffffffU >> (32 - 8 * r->size)),
>>>>               r->private);
>>>>  }
>>>> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
>>>> index 84b18736a85d..b72131729db6 100644
>>>> --- a/xen/include/xen/pci_regs.h
>>>> +++ b/xen/include/xen/pci_regs.h
>>>> @@ -66,6 +66,15 @@
>>>>  #define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
>>>>  #define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
>>>>  #define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
>>>> +#define  PCI_STATUS_RSVDZ_MASK		0x0006
>>>
>>> In my copy of the PCIe 6 spec bit 6 is also RsvdZ, so the mask should
>>> be 0x46.
>>
>> Right, mine too. It's probably safer to follow the newer version of the spec, so I'll make the change. For completeness / archaeology purposes, I just want to document some relevant data points here.
>>
>> In PCIe 4 spec, it says this about bit 6:
>> "These bits were used in previous versions of the programming model. Careful consideration should be given to any attempt to repurpose them."
>>
>> Going further back, PCI (old school PCI, not Express) spec 3.0 says this about bit 6:
>> "This bit is reserved. *"
>> "* In Revision 2.1 of this specification, this bit was used to indicate whether or not a device supported User Definable Features."
>>
>> Just above in our pci_regs.h (and equally in Linux include/uapi/linux/pci_regs.h) we have this definition for bit 6:
>> #define  PCI_STATUS_UDF         0x40    /* Support User Definable Features [obsolete] */
>>
>> Qemu hw/xen/xen_pt_config_init.c treats bit 6 as RO:
>>         .ro_mask    = 0x06F8,
> 
> Right, given the implementation of ro_mask that would likely be fine.
> Reading unconditionally as 0 while preserving the value on writes
> seems the safest option.

That would mean treating bit 6 as RsvdP, even though the PCIe 6 spec says RsvdZ. I just want to confirm this is indeed the intent since we both said RsvdZ just a moment ago? If so, I would add a comment since it's deviating from spec. I would personally still vote in favor of following PCIe 6 spec (RsvdZ).
Roger Pau Monné Nov. 23, 2023, 2:21 p.m. UTC | #16
On Thu, Nov 23, 2023 at 07:57:21AM -0500, Stewart Hildebrand wrote:
> On 11/23/23 03:14, Roger Pau Monné wrote:
> > On Wed, Nov 22, 2023 at 03:16:29PM -0500, Stewart Hildebrand wrote:
> >> On 11/17/23 07:40, Roger Pau Monné wrote:
> >>> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> >>>>      r->write(pdev, r->offset, data & (0xffffffffU >> (32 - 8 * r->size)),
> >>>>               r->private);
> >>>>  }
> >>>> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
> >>>> index 84b18736a85d..b72131729db6 100644
> >>>> --- a/xen/include/xen/pci_regs.h
> >>>> +++ b/xen/include/xen/pci_regs.h
> >>>> @@ -66,6 +66,15 @@
> >>>>  #define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
> >>>>  #define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
> >>>>  #define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
> >>>> +#define  PCI_STATUS_RSVDZ_MASK		0x0006
> >>>
> >>> In my copy of the PCIe 6 spec bit 6 is also RsvdZ, so the mask should
> >>> be 0x46.
> >>
> >> Right, mine too. It's probably safer to follow the newer version of the spec, so I'll make the change. For completeness / archaeology purposes, I just want to document some relevant data points here.
> >>
> >> In PCIe 4 spec, it says this about bit 6:
> >> "These bits were used in previous versions of the programming model. Careful consideration should be given to any attempt to repurpose them."
> >>
> >> Going further back, PCI (old school PCI, not Express) spec 3.0 says this about bit 6:
> >> "This bit is reserved. *"
> >> "* In Revision 2.1 of this specification, this bit was used to indicate whether or not a device supported User Definable Features."
> >>
> >> Just above in our pci_regs.h (and equally in Linux include/uapi/linux/pci_regs.h) we have this definition for bit 6:
> >> #define  PCI_STATUS_UDF         0x40    /* Support User Definable Features [obsolete] */
> >>
> >> Qemu hw/xen/xen_pt_config_init.c treats bit 6 as RO:
> >>         .ro_mask    = 0x06F8,
> > 
> > Right, given the implementation of ro_mask that would likely be fine.
> > Reading unconditionally as 0 while preserving the value on writes
> > seems the safest option.
> 
> That would mean treating bit 6 as RsvdP, even though the PCIe 6 spec
> says RsvdZ. I just want to confirm this is indeed the intent since
> we both said RsvdZ just a moment ago? If so, I would add a comment
> since it's deviating from spec. I would personally still vote in
> favor of following PCIe 6 spec (RsvdZ).

Hm, yes, lets go with the spec and use RsdvZ.  I very much doubt we
passthrough any device that actually uses the UDF bit.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 767c1ba718d7..af267b75ac31 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -521,6 +521,7 @@  static int cf_check init_bars(struct pci_dev *pdev)
     struct vpci_header *header = &pdev->vpci->header;
     struct vpci_bar *bars = header->bars;
     int rc;
+    bool mask_cap_list = false;
 
     switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
     {
@@ -544,6 +545,21 @@  static int cf_check init_bars(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
+    /*
+     * Utilize rsvdz_mask to hide PCI_STATUS_CAP_LIST from the guest for now. If
+     * support for rsvdp (reserved & preserved) is added in the future, the
+     * rsvdp mask should be used instead.
+     */
+    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
+                                PCI_STATUS, 2, NULL,
+                                PCI_STATUS_RSVDZ_MASK |
+                                    (mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
+                                PCI_STATUS_RO_MASK &
+                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
+                                PCI_STATUS_RW1C_MASK);
+    if ( rc )
+        return rc;
+
     if ( pdev->ignore_bars )
         return 0;
 
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 3bec9a4153da..b4cde7db1b3f 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -29,6 +29,9 @@  struct vpci_register {
     unsigned int offset;
     void *private;
     struct list_head node;
+    uint32_t rsvdz_mask;
+    uint32_t ro_mask;
+    uint32_t rw1c_mask;
 };
 
 #ifdef __XEN__
@@ -145,9 +148,16 @@  uint32_t cf_check vpci_hw_read32(
     return pci_conf_read32(pdev->sbdf, reg);
 }
 
-int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
-                      vpci_write_t *write_handler, unsigned int offset,
-                      unsigned int size, void *data)
+void cf_check vpci_hw_write16(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
+{
+    pci_conf_write16(pdev->sbdf, reg, val);
+}
+
+static int add_register(struct vpci *vpci, vpci_read_t *read_handler,
+                        vpci_write_t *write_handler, unsigned int offset,
+                        unsigned int size, void *data, uint32_t rsvdz_mask,
+                        uint32_t ro_mask, uint32_t rw1c_mask)
 {
     struct list_head *prev;
     struct vpci_register *r;
@@ -155,7 +165,8 @@  int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
     /* Some sanity checks. */
     if ( (size != 1 && size != 2 && size != 4) ||
          offset >= PCI_CFG_SPACE_EXP_SIZE || (offset & (size - 1)) ||
-         (!read_handler && !write_handler) )
+         (!read_handler && !write_handler) || (rsvdz_mask & ro_mask) ||
+         (rsvdz_mask & rw1c_mask) || (ro_mask & rw1c_mask) )
         return -EINVAL;
 
     r = xmalloc(struct vpci_register);
@@ -167,6 +178,9 @@  int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
     r->size = size;
     r->offset = offset;
     r->private = data;
+    r->rsvdz_mask = rsvdz_mask & (0xffffffffU >> (32 - 8 * size));
+    r->ro_mask = ro_mask & (0xffffffffU >> (32 - 8 * size));
+    r->rw1c_mask = rw1c_mask & (0xffffffffU >> (32 - 8 * size));
 
     spin_lock(&vpci->lock);
 
@@ -193,6 +207,23 @@  int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
     return 0;
 }
 
+int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
+                      vpci_write_t *write_handler, unsigned int offset,
+                      unsigned int size, void *data)
+{
+    return add_register(vpci, read_handler, write_handler, offset, size, data,
+                        0, 0, 0);
+}
+
+int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
+                           vpci_write_t *write_handler, unsigned int offset,
+                           unsigned int size, void *data, uint32_t rsvdz_mask,
+                           uint32_t ro_mask, uint32_t rw1c_mask)
+{
+    return add_register(vpci, read_handler, write_handler, offset, size, data,
+                        rsvdz_mask, ro_mask, rw1c_mask);
+}
+
 int vpci_remove_register(struct vpci *vpci, unsigned int offset,
                          unsigned int size)
 {
@@ -376,6 +407,7 @@  uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
         }
 
         val = r->read(pdev, r->offset, r->private);
+        val &= ~r->rsvdz_mask;
 
         /* Check if the read is in the middle of a register. */
         if ( r->offset < emu.offset )
@@ -407,26 +439,25 @@  uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
 
 /*
  * Perform a maybe partial write to a register.
- *
- * Note that this will only work for simple registers, if Xen needs to
- * trap accesses to rw1c registers (like the status PCI header register)
- * the logic in vpci_write will have to be expanded in order to correctly
- * deal with them.
  */
 static void vpci_write_helper(const struct pci_dev *pdev,
                               const struct vpci_register *r, unsigned int size,
                               unsigned int offset, uint32_t data)
 {
+    uint32_t val = 0;
+
     ASSERT(size <= r->size);
 
-    if ( size != r->size )
+    if ( (size != r->size) || r->ro_mask )
     {
-        uint32_t val;
-
         val = r->read(pdev, r->offset, r->private);
+        val &= ~r->rw1c_mask;
         data = merge_result(val, data, size, offset);
     }
 
+    data &= ~(r->rsvdz_mask | r->ro_mask);
+    data |= val & r->ro_mask;
+
     r->write(pdev, r->offset, data & (0xffffffffU >> (32 - 8 * r->size)),
              r->private);
 }
diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
index 84b18736a85d..b72131729db6 100644
--- a/xen/include/xen/pci_regs.h
+++ b/xen/include/xen/pci_regs.h
@@ -66,6 +66,15 @@ 
 #define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
 #define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
 #define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
+#define  PCI_STATUS_RSVDZ_MASK		0x0006
+
+#define  PCI_STATUS_RO_MASK (PCI_STATUS_IMM_READY | PCI_STATUS_INTERRUPT | \
+    PCI_STATUS_CAP_LIST | PCI_STATUS_66MHZ | PCI_STATUS_UDF | \
+    PCI_STATUS_FAST_BACK | PCI_STATUS_DEVSEL_MASK)
+#define  PCI_STATUS_RW1C_MASK (PCI_STATUS_PARITY | \
+    PCI_STATUS_SIG_TARGET_ABORT | PCI_STATUS_REC_TARGET_ABORT | \
+    PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_SIG_SYSTEM_ERROR | \
+    PCI_STATUS_DETECTED_PARITY)
 
 #define PCI_CLASS_REVISION	0x08	/* High 24 bits are class, low 8 revision */
 #define PCI_REVISION_ID		0x08	/* Revision ID */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 0b8a2a3c745b..7a5cca29e54c 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -37,6 +37,12 @@  int __must_check vpci_add_register(struct vpci *vpci,
                                    vpci_write_t *write_handler,
                                    unsigned int offset, unsigned int size,
                                    void *data);
+int __must_check vpci_add_register_mask(struct vpci *vpci,
+                                        vpci_read_t *read_handler,
+                                        vpci_write_t *write_handler,
+                                        unsigned int offset, unsigned int size,
+                                        void *data, uint32_t rsvdz_mask,
+                                        uint32_t ro_mask, uint32_t rw1c_mask);
 int __must_check vpci_remove_register(struct vpci *vpci, unsigned int offset,
                                       unsigned int size);
 
@@ -50,6 +56,8 @@  uint32_t cf_check vpci_hw_read16(
     const struct pci_dev *pdev, unsigned int reg, void *data);
 uint32_t cf_check vpci_hw_read32(
     const struct pci_dev *pdev, unsigned int reg, void *data);
+void cf_check vpci_hw_write16(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
 
 /*
  * Check for pending vPCI operations on this vcpu. Returns true if the vcpu