diff mbox series

[v2,1/2] xen/vpci: header: filter PCI capabilities

Message ID 20230816185035.82994-2-stewart.hildebrand@amd.com (mailing list archive)
State Superseded
Headers show
Series vPCI capabilities filtering | expand

Commit Message

Stewart Hildebrand Aug. 16, 2023, 6:50 p.m. UTC
Currently, Xen vPCI only supports virtualizing the MSI and MSI-X capabilities.
Hide all other PCI capabilities (including extended capabilities) from domUs for
now, even though there may be certain devices/drivers that depend on being able
to discover certain capabilities.

We parse the physical PCI capabilities linked list and add vPCI register
handlers for the next elements, inserting our own next value, thus presenting a
modified linked list to the domU.

Introduce helper functions vpci_hw_read8 and vpci_read_val. The vpci_read_val
helper function returns a fixed value, which may be used for RAZ registers, or
registers whose value doesn't change.

Adapt pci_find_next_cap() to suit our needs, which has no existing callers.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
If there are no capabilities to be exposed to the guest, a future status
register handler likely would want to mask the PCI_STATUS_CAP_LIST bit. See [1]
for a suggestion of how this might be tracked in struct vpci_header.

RFC: I'm not a fan of the (void *)(uintptr_t) cast for passing a value to
     vpci_read_val, but without the cast, the only alternative I could think of
     would be to introduce a new memory allocation. See the next patch for an
     example of what this might look like.

v1->v2:
* change type of ttl to int
* use switch statement instead of if/else
* adapt existing pci_find_next_cap helper instead of rolling our own
* pass ttl as in/out
* "pass through" the lower 2 bits of the next pointer
* squash helper functions into this patch to avoid transient dead code situation
* extended capabilities RAZ/WI

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-08/msg01173.html
---
 xen/drivers/pci/pci.c     | 18 +++++------
 xen/drivers/vpci/header.c | 66 +++++++++++++++++++++++++++++++++++++++
 xen/drivers/vpci/vpci.c   | 12 +++++++
 xen/include/xen/pci.h     |  3 +-
 xen/include/xen/vpci.h    |  5 +++
 5 files changed, 94 insertions(+), 10 deletions(-)

Comments

Jan Beulich Aug. 17, 2023, 12:57 p.m. UTC | #1
On 16.08.2023 20:50, Stewart Hildebrand wrote:
> If there are no capabilities to be exposed to the guest, a future status
> register handler likely would want to mask the PCI_STATUS_CAP_LIST bit. See [1]
> for a suggestion of how this might be tracked in struct vpci_header.

Can we actually get away without doing this right away? I'm not sure
consumers are required to range check what they read from PCI_CAPABILITY_LIST.

> --- a/xen/drivers/pci/pci.c
> +++ b/xen/drivers/pci/pci.c
> @@ -39,27 +39,27 @@ int pci_find_cap_offset(u16 seg, u8 bus, u8 dev, u8 func, u8 cap)
>      return 0;
>  }
>  
> -int pci_find_next_cap(u16 seg, u8 bus, unsigned int devfn, u8 pos, int cap)
> +int pci_find_next_cap(pci_sbdf_t sbdf, uint8_t pos, bool (*is_match)(uint8_t),
> +                      int *ttl)

Why plain int? When values can't go negative, respective variables generally
want to be of unsigned types.

>  {
> -    u8 id;
> -    int ttl = 48;
> +    uint8_t id;
>  
> -    while ( ttl-- )
> +    while ( (*ttl)-- > 0 )

I don't see why you add "> 0" here.

>      {
> -        pos = pci_conf_read8(PCI_SBDF(seg, bus, devfn), pos);
> +        pos = pci_conf_read8(sbdf, pos);
>          if ( pos < 0x40 )
>              break;
>  
> -        pos &= ~3;
> -        id = pci_conf_read8(PCI_SBDF(seg, bus, devfn), pos + PCI_CAP_LIST_ID);
> +        id = pci_conf_read8(sbdf, (pos & ~3) + PCI_CAP_LIST_ID);
>  
>          if ( id == 0xff )
>              break;
> -        if ( id == cap )
> +        if ( is_match(id) )
>              return pos;
>  
> -        pos += PCI_CAP_LIST_NEXT;
> +        pos = (pos & ~3) + PCI_CAP_LIST_NEXT;
>      }
> +
>      return 0;
>  }

As said in context of v1, this function should remain usable for its
original purpose. That, to me, includes the caller not needing to care about
ttl. I could see you convert the original function the way you do, but under
a new name, and then implement the original one simply in terms of this more
general purpose function.

Also, while I appreciate the sbdf conversion, that wants to be a separate
patch, which would then want to take care of the sibling functions as well.

> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -513,6 +513,18 @@ static void cf_check rom_write(
>          rom->addr = val & PCI_ROM_ADDRESS_MASK;
>  }
>  
> +static bool cf_check vpci_cap_supported(uint8_t id)
> +{
> +    switch ( id )
> +    {
> +    case PCI_CAP_ID_MSI:
> +    case PCI_CAP_ID_MSIX:
> +        return true;
> +    default:
> +        return false;
> +    }
> +}
> +
>  static int cf_check init_bars(struct pci_dev *pdev)
>  {
>      uint16_t cmd;
> @@ -544,6 +556,60 @@ static int cf_check init_bars(struct pci_dev *pdev)
>      if ( rc )
>          return rc;
>  
> +    if ( !is_hardware_domain(pdev->domain) )
> +    {
> +        if ( (pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST)
> +             == 0 )

This fits on a single line when written this more commonly used way:

        if ( !(pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST) )

Otherwise it needs wrapping differently - binary operators at a wrapping
point belong on the earlier line in our style.

> +        {
> +            /* RAZ/WI */
> +            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +                                   PCI_CAPABILITY_LIST, 1, NULL);

This last NULL is likely misleading to readers: It does not obviously
represent the value 0 cast to void *. (Same then for the extended cap
handler at the end.)

> +            if ( rc )
> +                return rc;
> +        }
> +        else
> +        {
> +            /* Only expose capabilities to the guest that vPCI can handle. */
> +            uint8_t next;
> +            int ttl = 48;
> +
> +            next = pci_find_next_cap(pdev->sbdf, PCI_CAPABILITY_LIST,
> +                                     vpci_cap_supported, &ttl);
> +
> +            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +                                   PCI_CAPABILITY_LIST, 1,
> +                                   (void *)(uintptr_t)next);
> +            if ( rc )
> +                return rc;
> +
> +            while ( next && (ttl > 0) )

Don't you need to mask off the low two bits first (rather than [only] ...

> +            {
> +                uint8_t pos = next;
> +
> +                next = pci_find_next_cap(pdev->sbdf, pos + PCI_CAP_LIST_NEXT,
> +                                         vpci_cap_supported, &ttl);
> +
> +                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
> +                                       pos + PCI_CAP_LIST_ID, 1, NULL);
> +                if ( rc )
> +                    return rc;
> +
> +                rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> +                                       pos + PCI_CAP_LIST_NEXT, 1,
> +                                       (void *)(uintptr_t)next);
> +                if ( rc )
> +                    return rc;
> +
> +                next &= ~3;

... here)?

Jan
Stewart Hildebrand Aug. 21, 2023, 1:28 a.m. UTC | #2
On 8/17/23 08:57, Jan Beulich wrote:
> On 16.08.2023 20:50, Stewart Hildebrand wrote:
>> If there are no capabilities to be exposed to the guest, a future status
>> register handler likely would want to mask the PCI_STATUS_CAP_LIST bit. See [1]
>> for a suggestion of how this might be tracked in struct vpci_header.
> 
> Can we actually get away without doing this right away? I'm not sure
> consumers are required to range check what they read from PCI_CAPABILITY_LIST.

I think you're right, this should be done right away. I'll add a status register handler with PCI_STATUS_CAP_LIST bit masking in the next version of the series. This will include a modification to vpci_write_helper() be able to handle rw1c registers properly.

>> --- a/xen/drivers/pci/pci.c
>> +++ b/xen/drivers/pci/pci.c
>> @@ -39,27 +39,27 @@ int pci_find_cap_offset(u16 seg, u8 bus, u8 dev, u8 func, u8 cap)
>>      return 0;
>>  }
>>
>> -int pci_find_next_cap(u16 seg, u8 bus, unsigned int devfn, u8 pos, int cap)
>> +int pci_find_next_cap(pci_sbdf_t sbdf, uint8_t pos, bool (*is_match)(uint8_t),
>> +                      int *ttl)
> 
> Why plain int? When values can't go negative, respective variables generally
> want to be of unsigned types.

I'll change to unsigned int.

>>  {
>> -    u8 id;
>> -    int ttl = 48;
>> +    uint8_t id;
>>
>> -    while ( ttl-- )
>> +    while ( (*ttl)-- > 0 )
> 
> I don't see why you add "> 0" here.

I'll remove it.

>>      {
>> -        pos = pci_conf_read8(PCI_SBDF(seg, bus, devfn), pos);
>> +        pos = pci_conf_read8(sbdf, pos);
>>          if ( pos < 0x40 )
>>              break;
>>
>> -        pos &= ~3;
>> -        id = pci_conf_read8(PCI_SBDF(seg, bus, devfn), pos + PCI_CAP_LIST_ID);
>> +        id = pci_conf_read8(sbdf, (pos & ~3) + PCI_CAP_LIST_ID);
>>
>>          if ( id == 0xff )
>>              break;
>> -        if ( id == cap )
>> +        if ( is_match(id) )
>>              return pos;
>>
>> -        pos += PCI_CAP_LIST_NEXT;
>> +        pos = (pos & ~3) + PCI_CAP_LIST_NEXT;
>>      }
>> +
>>      return 0;
>>  }
> 
> As said in context of v1, this function should remain usable for its
> original purpose. That, to me, includes the caller not needing to care about
> ttl. I could see you convert the original function the way you do, but under
> a new name, and then implement the original one simply in terms of this more
> general purpose function.

Will do.

> Also, while I appreciate the sbdf conversion, that wants to be a separate
> patch, which would then want to take care of the sibling functions as well.

OK, will do. To be clear, this means converting the following 4 functions to use pci_sbdf_t, along with all the call sites:
pci_find_cap_offset
pci_find_next_cap
pci_find_ext_capability
pci_find_next_ext_capability

>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -513,6 +513,18 @@ static void cf_check rom_write(
>>          rom->addr = val & PCI_ROM_ADDRESS_MASK;
>>  }
>>
>> +static bool cf_check vpci_cap_supported(uint8_t id)
>> +{
>> +    switch ( id )
>> +    {
>> +    case PCI_CAP_ID_MSI:
>> +    case PCI_CAP_ID_MSIX:
>> +        return true;
>> +    default:
>> +        return false;
>> +    }
>> +}
>> +
>>  static int cf_check init_bars(struct pci_dev *pdev)
>>  {
>>      uint16_t cmd;
>> @@ -544,6 +556,60 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>      if ( rc )
>>          return rc;
>>
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +    {
>> +        if ( (pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST)
>> +             == 0 )
> 
> This fits on a single line when written this more commonly used way:
> 
>         if ( !(pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST) )

I'll do this.

> Otherwise it needs wrapping differently - binary operators at a wrapping
> point belong on the earlier line in our style.
> 
>> +        {
>> +            /* RAZ/WI */
>> +            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>> +                                   PCI_CAPABILITY_LIST, 1, NULL);
> 
> This last NULL is likely misleading to readers: It does not obviously
> represent the value 0 cast to void *. (Same then for the extended cap
> handler at the end.)

I'll change to (void *)0

>> +            if ( rc )
>> +                return rc;
>> +        }
>> +        else
>> +        {
>> +            /* Only expose capabilities to the guest that vPCI can handle. */
>> +            uint8_t next;
>> +            int ttl = 48;
>> +
>> +            next = pci_find_next_cap(pdev->sbdf, PCI_CAPABILITY_LIST,
>> +                                     vpci_cap_supported, &ttl);
>> +
>> +            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>> +                                   PCI_CAPABILITY_LIST, 1,
>> +                                   (void *)(uintptr_t)next);
>> +            if ( rc )
>> +                return rc;
>> +
>> +            while ( next && (ttl > 0) )
> 
> Don't you need to mask off the low two bits first (rather than [only] ...

Yes, I'll fix this.

> 
>> +            {
>> +                uint8_t pos = next;
>> +
>> +                next = pci_find_next_cap(pdev->sbdf, pos + PCI_CAP_LIST_NEXT,
>> +                                         vpci_cap_supported, &ttl);
>> +
>> +                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
>> +                                       pos + PCI_CAP_LIST_ID, 1, NULL);
>> +                if ( rc )
>> +                    return rc;
>> +
>> +                rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>> +                                       pos + PCI_CAP_LIST_NEXT, 1,
>> +                                       (void *)(uintptr_t)next);
>> +                if ( rc )
>> +                    return rc;
>> +
>> +                next &= ~3;
> 
> ... here)?
> 
> Jan
diff mbox series

Patch

diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c
index e411876a1518..fbebbe4a872a 100644
--- a/xen/drivers/pci/pci.c
+++ b/xen/drivers/pci/pci.c
@@ -39,27 +39,27 @@  int pci_find_cap_offset(u16 seg, u8 bus, u8 dev, u8 func, u8 cap)
     return 0;
 }
 
-int pci_find_next_cap(u16 seg, u8 bus, unsigned int devfn, u8 pos, int cap)
+int pci_find_next_cap(pci_sbdf_t sbdf, uint8_t pos, bool (*is_match)(uint8_t),
+                      int *ttl)
 {
-    u8 id;
-    int ttl = 48;
+    uint8_t id;
 
-    while ( ttl-- )
+    while ( (*ttl)-- > 0 )
     {
-        pos = pci_conf_read8(PCI_SBDF(seg, bus, devfn), pos);
+        pos = pci_conf_read8(sbdf, pos);
         if ( pos < 0x40 )
             break;
 
-        pos &= ~3;
-        id = pci_conf_read8(PCI_SBDF(seg, bus, devfn), pos + PCI_CAP_LIST_ID);
+        id = pci_conf_read8(sbdf, (pos & ~3) + PCI_CAP_LIST_ID);
 
         if ( id == 0xff )
             break;
-        if ( id == cap )
+        if ( is_match(id) )
             return pos;
 
-        pos += PCI_CAP_LIST_NEXT;
+        pos = (pos & ~3) + PCI_CAP_LIST_NEXT;
     }
+
     return 0;
 }
 
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 60f7049e3498..ec5947300198 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -513,6 +513,18 @@  static void cf_check rom_write(
         rom->addr = val & PCI_ROM_ADDRESS_MASK;
 }
 
+static bool cf_check vpci_cap_supported(uint8_t id)
+{
+    switch ( id )
+    {
+    case PCI_CAP_ID_MSI:
+    case PCI_CAP_ID_MSIX:
+        return true;
+    default:
+        return false;
+    }
+}
+
 static int cf_check init_bars(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -544,6 +556,60 @@  static int cf_check init_bars(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
+    if ( !is_hardware_domain(pdev->domain) )
+    {
+        if ( (pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST)
+             == 0 )
+        {
+            /* RAZ/WI */
+            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                   PCI_CAPABILITY_LIST, 1, NULL);
+            if ( rc )
+                return rc;
+        }
+        else
+        {
+            /* Only expose capabilities to the guest that vPCI can handle. */
+            uint8_t next;
+            int ttl = 48;
+
+            next = pci_find_next_cap(pdev->sbdf, PCI_CAPABILITY_LIST,
+                                     vpci_cap_supported, &ttl);
+
+            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                   PCI_CAPABILITY_LIST, 1,
+                                   (void *)(uintptr_t)next);
+            if ( rc )
+                return rc;
+
+            while ( next && (ttl > 0) )
+            {
+                uint8_t pos = next;
+
+                next = pci_find_next_cap(pdev->sbdf, pos + PCI_CAP_LIST_NEXT,
+                                         vpci_cap_supported, &ttl);
+
+                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
+                                       pos + PCI_CAP_LIST_ID, 1, NULL);
+                if ( rc )
+                    return rc;
+
+                rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                       pos + PCI_CAP_LIST_NEXT, 1,
+                                       (void *)(uintptr_t)next);
+                if ( rc )
+                    return rc;
+
+                next &= ~3;
+            }
+        }
+
+        /* Extended capabilities RAZ/WI */
+        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL, 0x100, 4, NULL);
+        if ( rc )
+            return rc;
+    }
+
     if ( pdev->ignore_bars )
         return 0;
 
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index d73fa7630237..4a96aa50494d 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -133,6 +133,18 @@  static void cf_check vpci_ignored_write(
 {
 }
 
+uint32_t cf_check vpci_read_val(
+    const struct pci_dev *pdev, unsigned int reg, void *data)
+{
+    return (uintptr_t)data;
+}
+
+uint32_t cf_check vpci_hw_read8(
+    const struct pci_dev *pdev, unsigned int reg, void *data)
+{
+    return pci_conf_read8(pdev->sbdf, reg);
+}
+
 uint32_t cf_check vpci_hw_read16(
     const struct pci_dev *pdev, unsigned int reg, void *data)
 {
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 5975ca2f3032..57792282108b 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -194,7 +194,8 @@  int pci_mmcfg_read(unsigned int seg, unsigned int bus,
 int pci_mmcfg_write(unsigned int seg, unsigned int bus,
                     unsigned int devfn, int reg, int len, u32 value);
 int pci_find_cap_offset(u16 seg, u8 bus, u8 dev, u8 func, u8 cap);
-int pci_find_next_cap(u16 seg, u8 bus, unsigned int devfn, u8 pos, int cap);
+int pci_find_next_cap(pci_sbdf_t sbdf, uint8_t pos, bool (*is_match)(uint8_t),
+                      int *ttl);
 int pci_find_ext_capability(int seg, int bus, int devfn, int cap);
 int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int cap);
 const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus,
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 0b8a2a3c745b..17fd252746ec 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -45,7 +45,12 @@  uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size);
 void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
                 uint32_t data);
 
+uint32_t cf_check vpci_read_val(
+    const struct pci_dev *pdev, unsigned int reg, void *data);
+
 /* Passthrough handlers. */
+uint32_t cf_check vpci_hw_read8(
+    const struct pci_dev *pdev, unsigned int reg, void *data);
 uint32_t cf_check vpci_hw_read16(
     const struct pci_dev *pdev, unsigned int reg, void *data);
 uint32_t cf_check vpci_hw_read32(