diff mbox series

vfio/pci: Properly hide first-in-list PCIe extended capability

Message ID 20241120143826.17856-1-avihaih@nvidia.com (mailing list archive)
State New
Headers show
Series vfio/pci: Properly hide first-in-list PCIe extended capability | expand

Commit Message

Avihai Horon Nov. 20, 2024, 2:38 p.m. UTC
There are cases where a PCIe extended capability should be hidden from
the user. For example, an unknown capability (i.e., capability with ID
greater than PCI_EXT_CAP_ID_MAX) or a capability that is intentionally
chosen to be hidden from the user.

Hiding a capability is done by virtualizing and modifying the 'Next
Capability Offset' field of the previous capability so it points to the
capability after the one that should be hidden.

The special case where the first capability in the list should be hidden
is handled differently because there is no previous capability that can
be modified. In this case, the capability ID and version are zeroed
while leaving the next pointer intact. This hides the capability and
leaves an anchor for the rest of the capability list.

However, today, hiding the first capability in the list is not done
properly, as struct vfio_pci_core_device->pci_config_map is still set to
the capability ID. If the first capability in the list is unknown, the
following warning [1] is triggered and an out-of-bounds access to
ecap_perms array occurs when vfio_config_do_rw() later uses
pci_config_map to pick the right permissions.

Fix it by defining a new special capability PCI_CAP_ID_FIRST_HIDDEN,
that represents a hidden extended capability that is located first in
the extended capability list, and set pci_config_map to it in the above
case.

[1]

WARNING: CPU: 118 PID: 5329 at drivers/vfio/pci/vfio_pci_config.c:1900 vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
CPU: 118 UID: 0 PID: 5329 Comm: simx-qemu-syste Not tainted 6.12.0+ #1
(snip)
Call Trace:
 <TASK>
 ? show_regs+0x69/0x80
 ? __warn+0x8d/0x140
 ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
 ? report_bug+0x18f/0x1a0
 ? handle_bug+0x63/0xa0
 ? exc_invalid_op+0x19/0x70
 ? asm_exc_invalid_op+0x1b/0x20
 ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
 ? vfio_pci_config_rw+0x244/0x430 [vfio_pci_core]
 vfio_pci_rw+0x101/0x1b0 [vfio_pci_core]
 vfio_pci_core_read+0x1d/0x30 [vfio_pci_core]
 vfio_device_fops_read+0x27/0x40 [vfio]
 vfs_read+0xbd/0x340
 ? vfio_device_fops_unl_ioctl+0xbb/0x740 [vfio]
 ? __rseq_handle_notify_resume+0xa4/0x4b0
 __x64_sys_pread64+0x96/0xc0
 x64_sys_call+0x1c3d/0x20d0
 do_syscall_64+0x4d/0x120
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_priv.h   |  1 +
 drivers/vfio/pci/vfio_pci_config.c | 18 +++++++++++++-----
 2 files changed, 14 insertions(+), 5 deletions(-)

Comments

Alex Williamson Nov. 20, 2024, 11:42 p.m. UTC | #1
On Wed, 20 Nov 2024 16:38:26 +0200
Avihai Horon <avihaih@nvidia.com> wrote:

> There are cases where a PCIe extended capability should be hidden from
> the user. For example, an unknown capability (i.e., capability with ID
> greater than PCI_EXT_CAP_ID_MAX) or a capability that is intentionally
> chosen to be hidden from the user.
> 
> Hiding a capability is done by virtualizing and modifying the 'Next
> Capability Offset' field of the previous capability so it points to the
> capability after the one that should be hidden.
> 
> The special case where the first capability in the list should be hidden
> is handled differently because there is no previous capability that can
> be modified. In this case, the capability ID and version are zeroed
> while leaving the next pointer intact. This hides the capability and
> leaves an anchor for the rest of the capability list.
> 
> However, today, hiding the first capability in the list is not done
> properly, as struct vfio_pci_core_device->pci_config_map is still set to
> the capability ID. If the first capability in the list is unknown, the
> following warning [1] is triggered and an out-of-bounds access to
> ecap_perms array occurs when vfio_config_do_rw() later uses
> pci_config_map to pick the right permissions.
> 
> Fix it by defining a new special capability PCI_CAP_ID_FIRST_HIDDEN,
> that represents a hidden extended capability that is located first in
> the extended capability list, and set pci_config_map to it in the above
> case.
> 
> [1]
> 
> WARNING: CPU: 118 PID: 5329 at drivers/vfio/pci/vfio_pci_config.c:1900 vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
> CPU: 118 UID: 0 PID: 5329 Comm: simx-qemu-syste Not tainted 6.12.0+ #1
> (snip)
> Call Trace:
>  <TASK>
>  ? show_regs+0x69/0x80
>  ? __warn+0x8d/0x140
>  ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
>  ? report_bug+0x18f/0x1a0
>  ? handle_bug+0x63/0xa0
>  ? exc_invalid_op+0x19/0x70
>  ? asm_exc_invalid_op+0x1b/0x20
>  ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
>  ? vfio_pci_config_rw+0x244/0x430 [vfio_pci_core]
>  vfio_pci_rw+0x101/0x1b0 [vfio_pci_core]
>  vfio_pci_core_read+0x1d/0x30 [vfio_pci_core]
>  vfio_device_fops_read+0x27/0x40 [vfio]
>  vfs_read+0xbd/0x340
>  ? vfio_device_fops_unl_ioctl+0xbb/0x740 [vfio]
>  ? __rseq_handle_notify_resume+0xa4/0x4b0
>  __x64_sys_pread64+0x96/0xc0
>  x64_sys_call+0x1c3d/0x20d0
>  do_syscall_64+0x4d/0x120
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>  drivers/vfio/pci/vfio_pci_priv.h   |  1 +
>  drivers/vfio/pci/vfio_pci_config.c | 18 +++++++++++++-----
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
> index 5e4fa69aee16..4728b8069c52 100644
> --- a/drivers/vfio/pci/vfio_pci_priv.h
> +++ b/drivers/vfio/pci/vfio_pci_priv.h
> @@ -7,6 +7,7 @@
>  /* Special capability IDs predefined access */
>  #define PCI_CAP_ID_INVALID		0xFF	/* default raw access */
>  #define PCI_CAP_ID_INVALID_VIRT		0xFE	/* default virt access */
> +#define PCI_CAP_ID_FIRST_HIDDEN		0xFD	/* default direct access */

Thanks for catching this!  I wonder if the explicit tracking of this
via another dummy capability ID is really necessary though.  I think
the only way we can get a value in the pci_config_map greater than
PCI_EXT_CAP_ID_MAX is this scenario where it appears at the base of the
extended capability chain.  Therefore couldn't we just do something
like (compile tested only):

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 97422aafaa7b..beea05020888 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -313,12 +313,16 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
 	return count;
 }
 
+static const struct perm_bits direct_ro_perms = {
+	.readfn = vfio_direct_config_read
+};
+
 /* Default capability regions to read-only, no-virtualization */
 static struct perm_bits cap_perms[PCI_CAP_ID_MAX + 1] = {
-	[0 ... PCI_CAP_ID_MAX] = { .readfn = vfio_direct_config_read }
+	[0 ... PCI_CAP_ID_MAX] = direct_ro_perms
 };
 static struct perm_bits ecap_perms[PCI_EXT_CAP_ID_MAX + 1] = {
-	[0 ... PCI_EXT_CAP_ID_MAX] = { .readfn = vfio_direct_config_read }
+	[0 ... PCI_EXT_CAP_ID_MAX] = direct_ro_perms
 };
 /*
  * Default unassigned regions to raw read-write access.  Some devices
@@ -1897,9 +1901,17 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user
 		cap_start = *ppos;
 	} else {
 		if (*ppos >= PCI_CFG_SPACE_SIZE) {
-			WARN_ON(cap_id > PCI_EXT_CAP_ID_MAX);
+			/*
+			 * We can get a cap_id that exceeds PCI_EXT_CAP_ID_MAX
+			 * if we're hiding an unknown capability at the start
+			 * of the extended capability chain.  Use default, ro
+			 * access, which will virtualize the id and next values.
+			 */
+			if (cap_id > PCI_EXT_CAP_ID_MAX)
+				perm = (struct perm_bits *)&direct_ro_perms;
+			else
+				perm = &ecap_perms[cap_id];
 
-			perm = &ecap_perms[cap_id];
 			cap_start = vfio_find_cap_start(vdev, *ppos);
 		} else {
 			WARN_ON(cap_id > PCI_CAP_ID_MAX);


>  
>  /* Cap maximum number of ioeventfds per device (arbitrary) */
>  #define VFIO_PCI_IOEVENTFD_MAX		1000
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index 97422aafaa7b..95f8a6a10166 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -320,6 +320,10 @@ static struct perm_bits cap_perms[PCI_CAP_ID_MAX + 1] = {
>  static struct perm_bits ecap_perms[PCI_EXT_CAP_ID_MAX + 1] = {
>  	[0 ... PCI_EXT_CAP_ID_MAX] = { .readfn = vfio_direct_config_read }
>  };
> +/* Perms for a first-in-list hidden extended capability */
> +static struct perm_bits hidden_ecap_perm = {
> +	.readfn = vfio_direct_config_read,
> +};
>  /*
>   * Default unassigned regions to raw read-write access.  Some devices
>   * require this to function as they hide registers between the gaps in
> @@ -1582,7 +1586,7 @@ static int vfio_cap_init(struct vfio_pci_core_device *vdev)
>  				 __func__, pos + i, map[pos + i], cap);
>  		}
>  
> -		BUILD_BUG_ON(PCI_CAP_ID_MAX >= PCI_CAP_ID_INVALID_VIRT);
> +		BUILD_BUG_ON(PCI_CAP_ID_MAX >= PCI_CAP_ID_FIRST_HIDDEN);
>  
>  		memset(map + pos, cap, len);
>  		ret = vfio_fill_vconfig_bytes(vdev, pos, len);
> @@ -1673,9 +1677,9 @@ static int vfio_ecap_init(struct vfio_pci_core_device *vdev)
>  		/*
>  		 * Even though ecap is 2 bytes, we're currently a long way
>  		 * from exceeding 1 byte capabilities.  If we ever make it
> -		 * up to 0xFE we'll need to up this to a two-byte, byte map.
> +		 * up to 0xFD we'll need to up this to a two-byte, byte map.
>  		 */
> -		BUILD_BUG_ON(PCI_EXT_CAP_ID_MAX >= PCI_CAP_ID_INVALID_VIRT);
> +		BUILD_BUG_ON(PCI_EXT_CAP_ID_MAX >= PCI_CAP_ID_FIRST_HIDDEN);
>  
>  		memset(map + epos, ecap, len);
>  		ret = vfio_fill_vconfig_bytes(vdev, epos, len);
> @@ -1688,10 +1692,11 @@ static int vfio_ecap_init(struct vfio_pci_core_device *vdev)
>  		 * indicates to use cap id = 0, version = 0, next = 0 if
>  		 * ecaps are absent, hope users check all the way to next.
>  		 */
> -		if (hidden)
> +		if (hidden) {
>  			*(__le32 *)&vdev->vconfig[epos] &=
>  				cpu_to_le32((0xffcU << 20));
> -		else
> +			memset(map + epos, PCI_CAP_ID_FIRST_HIDDEN, len);
> +		} else
>  			ecaps++;

We need to add braces on the else branch as well, per our coding style
standard.  Alternatively we might overwrite the ecap value where we
previously set hidden to true.  Thanks,

Alex

>  
>  		prev = (__le32 *)&vdev->vconfig[epos];
> @@ -1895,6 +1900,9 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user
>  	} else if (cap_id == PCI_CAP_ID_INVALID_VIRT) {
>  		perm = &virt_perms;
>  		cap_start = *ppos;
> +	} else if (cap_id == PCI_CAP_ID_FIRST_HIDDEN) {
> +		perm = &hidden_ecap_perm;
> +		cap_start = PCI_CFG_SPACE_SIZE;
>  	} else {
>  		if (*ppos >= PCI_CFG_SPACE_SIZE) {
>  			WARN_ON(cap_id > PCI_EXT_CAP_ID_MAX);
Avihai Horon Nov. 21, 2024, 11:18 a.m. UTC | #2
On 21/11/2024 1:42, Alex Williamson wrote:
> External email: Use caution opening links or attachments
>
>
> On Wed, 20 Nov 2024 16:38:26 +0200
> Avihai Horon <avihaih@nvidia.com> wrote:
>
>> There are cases where a PCIe extended capability should be hidden from
>> the user. For example, an unknown capability (i.e., capability with ID
>> greater than PCI_EXT_CAP_ID_MAX) or a capability that is intentionally
>> chosen to be hidden from the user.
>>
>> Hiding a capability is done by virtualizing and modifying the 'Next
>> Capability Offset' field of the previous capability so it points to the
>> capability after the one that should be hidden.
>>
>> The special case where the first capability in the list should be hidden
>> is handled differently because there is no previous capability that can
>> be modified. In this case, the capability ID and version are zeroed
>> while leaving the next pointer intact. This hides the capability and
>> leaves an anchor for the rest of the capability list.
>>
>> However, today, hiding the first capability in the list is not done
>> properly, as struct vfio_pci_core_device->pci_config_map is still set to
>> the capability ID. If the first capability in the list is unknown, the
>> following warning [1] is triggered and an out-of-bounds access to
>> ecap_perms array occurs when vfio_config_do_rw() later uses
>> pci_config_map to pick the right permissions.
>>
>> Fix it by defining a new special capability PCI_CAP_ID_FIRST_HIDDEN,
>> that represents a hidden extended capability that is located first in
>> the extended capability list, and set pci_config_map to it in the above
>> case.
>>
>> [1]
>>
>> WARNING: CPU: 118 PID: 5329 at drivers/vfio/pci/vfio_pci_config.c:1900 vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
>> CPU: 118 UID: 0 PID: 5329 Comm: simx-qemu-syste Not tainted 6.12.0+ #1
>> (snip)
>> Call Trace:
>>   <TASK>
>>   ? show_regs+0x69/0x80
>>   ? __warn+0x8d/0x140
>>   ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
>>   ? report_bug+0x18f/0x1a0
>>   ? handle_bug+0x63/0xa0
>>   ? exc_invalid_op+0x19/0x70
>>   ? asm_exc_invalid_op+0x1b/0x20
>>   ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core]
>>   ? vfio_pci_config_rw+0x244/0x430 [vfio_pci_core]
>>   vfio_pci_rw+0x101/0x1b0 [vfio_pci_core]
>>   vfio_pci_core_read+0x1d/0x30 [vfio_pci_core]
>>   vfio_device_fops_read+0x27/0x40 [vfio]
>>   vfs_read+0xbd/0x340
>>   ? vfio_device_fops_unl_ioctl+0xbb/0x740 [vfio]
>>   ? __rseq_handle_notify_resume+0xa4/0x4b0
>>   __x64_sys_pread64+0x96/0xc0
>>   x64_sys_call+0x1c3d/0x20d0
>>   do_syscall_64+0x4d/0x120
>>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>>   drivers/vfio/pci/vfio_pci_priv.h   |  1 +
>>   drivers/vfio/pci/vfio_pci_config.c | 18 +++++++++++++-----
>>   2 files changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
>> index 5e4fa69aee16..4728b8069c52 100644
>> --- a/drivers/vfio/pci/vfio_pci_priv.h
>> +++ b/drivers/vfio/pci/vfio_pci_priv.h
>> @@ -7,6 +7,7 @@
>>   /* Special capability IDs predefined access */
>>   #define PCI_CAP_ID_INVALID           0xFF    /* default raw access */
>>   #define PCI_CAP_ID_INVALID_VIRT              0xFE    /* default virt access */
>> +#define PCI_CAP_ID_FIRST_HIDDEN              0xFD    /* default direct access */
> Thanks for catching this!  I wonder if the explicit tracking of this
> via another dummy capability ID is really necessary though.  I think
> the only way we can get a value in the pci_config_map greater than
> PCI_EXT_CAP_ID_MAX is this scenario where it appears at the base of the
> extended capability chain.  Therefore couldn't we just do something
> like (compile tested only):
>
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index 97422aafaa7b..beea05020888 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -313,12 +313,16 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
>          return count;
>   }
>
> +static const struct perm_bits direct_ro_perms = {
> +       .readfn = vfio_direct_config_read
> +};
> +
>   /* Default capability regions to read-only, no-virtualization */
>   static struct perm_bits cap_perms[PCI_CAP_ID_MAX + 1] = {
> -       [0 ... PCI_CAP_ID_MAX] = { .readfn = vfio_direct_config_read }
> +       [0 ... PCI_CAP_ID_MAX] = direct_ro_perms
>   };
>   static struct perm_bits ecap_perms[PCI_EXT_CAP_ID_MAX + 1] = {
> -       [0 ... PCI_EXT_CAP_ID_MAX] = { .readfn = vfio_direct_config_read }
> +       [0 ... PCI_EXT_CAP_ID_MAX] = direct_ro_perms
>   };
>   /*
>    * Default unassigned regions to raw read-write access.  Some devices
> @@ -1897,9 +1901,17 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user
>                  cap_start = *ppos;
>          } else {
>                  if (*ppos >= PCI_CFG_SPACE_SIZE) {
> -                       WARN_ON(cap_id > PCI_EXT_CAP_ID_MAX);
> +                       /*
> +                        * We can get a cap_id that exceeds PCI_EXT_CAP_ID_MAX
> +                        * if we're hiding an unknown capability at the start
> +                        * of the extended capability chain.  Use default, ro
> +                        * access, which will virtualize the id and next values.
> +                        */
> +                       if (cap_id > PCI_EXT_CAP_ID_MAX)
> +                               perm = (struct perm_bits *)&direct_ro_perms;
> +                       else
> +                               perm = &ecap_perms[cap_id];
>
> -                       perm = &ecap_perms[cap_id];
>                          cap_start = vfio_find_cap_start(vdev, *ppos);
>                  } else {
>                          WARN_ON(cap_id > PCI_CAP_ID_MAX);

Yes, you are right. This works and appears to be simpler than my fix.
I will send a v2 with your suggestion shortly.

>
>>   /* Cap maximum number of ioeventfds per device (arbitrary) */
>>   #define VFIO_PCI_IOEVENTFD_MAX               1000
>> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
>> index 97422aafaa7b..95f8a6a10166 100644
>> --- a/drivers/vfio/pci/vfio_pci_config.c
>> +++ b/drivers/vfio/pci/vfio_pci_config.c
>> @@ -320,6 +320,10 @@ static struct perm_bits cap_perms[PCI_CAP_ID_MAX + 1] = {
>>   static struct perm_bits ecap_perms[PCI_EXT_CAP_ID_MAX + 1] = {
>>        [0 ... PCI_EXT_CAP_ID_MAX] = { .readfn = vfio_direct_config_read }
>>   };
>> +/* Perms for a first-in-list hidden extended capability */
>> +static struct perm_bits hidden_ecap_perm = {
>> +     .readfn = vfio_direct_config_read,
>> +};
>>   /*
>>    * Default unassigned regions to raw read-write access.  Some devices
>>    * require this to function as they hide registers between the gaps in
>> @@ -1582,7 +1586,7 @@ static int vfio_cap_init(struct vfio_pci_core_device *vdev)
>>                                 __func__, pos + i, map[pos + i], cap);
>>                }
>>
>> -             BUILD_BUG_ON(PCI_CAP_ID_MAX >= PCI_CAP_ID_INVALID_VIRT);
>> +             BUILD_BUG_ON(PCI_CAP_ID_MAX >= PCI_CAP_ID_FIRST_HIDDEN);
>>
>>                memset(map + pos, cap, len);
>>                ret = vfio_fill_vconfig_bytes(vdev, pos, len);
>> @@ -1673,9 +1677,9 @@ static int vfio_ecap_init(struct vfio_pci_core_device *vdev)
>>                /*
>>                 * Even though ecap is 2 bytes, we're currently a long way
>>                 * from exceeding 1 byte capabilities.  If we ever make it
>> -              * up to 0xFE we'll need to up this to a two-byte, byte map.
>> +              * up to 0xFD we'll need to up this to a two-byte, byte map.
>>                 */
>> -             BUILD_BUG_ON(PCI_EXT_CAP_ID_MAX >= PCI_CAP_ID_INVALID_VIRT);
>> +             BUILD_BUG_ON(PCI_EXT_CAP_ID_MAX >= PCI_CAP_ID_FIRST_HIDDEN);
>>
>>                memset(map + epos, ecap, len);
>>                ret = vfio_fill_vconfig_bytes(vdev, epos, len);
>> @@ -1688,10 +1692,11 @@ static int vfio_ecap_init(struct vfio_pci_core_device *vdev)
>>                 * indicates to use cap id = 0, version = 0, next = 0 if
>>                 * ecaps are absent, hope users check all the way to next.
>>                 */
>> -             if (hidden)
>> +             if (hidden) {
>>                        *(__le32 *)&vdev->vconfig[epos] &=
>>                                cpu_to_le32((0xffcU << 20));
>> -             else
>> +                     memset(map + epos, PCI_CAP_ID_FIRST_HIDDEN, len);
>> +             } else
>>                        ecaps++;
> We need to add braces on the else branch as well, per our coding style
> standard.

Ack, I will remember this for future work.

Thanks!

> Alternatively we might overwrite the ecap value where we
> previously set hidden to true.  Thanks,
>
> Alex
>
>>                prev = (__le32 *)&vdev->vconfig[epos];
>> @@ -1895,6 +1900,9 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user
>>        } else if (cap_id == PCI_CAP_ID_INVALID_VIRT) {
>>                perm = &virt_perms;
>>                cap_start = *ppos;
>> +     } else if (cap_id == PCI_CAP_ID_FIRST_HIDDEN) {
>> +             perm = &hidden_ecap_perm;
>> +             cap_start = PCI_CFG_SPACE_SIZE;
>>        } else {
>>                if (*ppos >= PCI_CFG_SPACE_SIZE) {
>>                        WARN_ON(cap_id > PCI_EXT_CAP_ID_MAX);
diff mbox series

Patch

diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index 5e4fa69aee16..4728b8069c52 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -7,6 +7,7 @@ 
 /* Special capability IDs predefined access */
 #define PCI_CAP_ID_INVALID		0xFF	/* default raw access */
 #define PCI_CAP_ID_INVALID_VIRT		0xFE	/* default virt access */
+#define PCI_CAP_ID_FIRST_HIDDEN		0xFD	/* default direct access */
 
 /* Cap maximum number of ioeventfds per device (arbitrary) */
 #define VFIO_PCI_IOEVENTFD_MAX		1000
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 97422aafaa7b..95f8a6a10166 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -320,6 +320,10 @@  static struct perm_bits cap_perms[PCI_CAP_ID_MAX + 1] = {
 static struct perm_bits ecap_perms[PCI_EXT_CAP_ID_MAX + 1] = {
 	[0 ... PCI_EXT_CAP_ID_MAX] = { .readfn = vfio_direct_config_read }
 };
+/* Perms for a first-in-list hidden extended capability */
+static struct perm_bits hidden_ecap_perm = {
+	.readfn = vfio_direct_config_read,
+};
 /*
  * Default unassigned regions to raw read-write access.  Some devices
  * require this to function as they hide registers between the gaps in
@@ -1582,7 +1586,7 @@  static int vfio_cap_init(struct vfio_pci_core_device *vdev)
 				 __func__, pos + i, map[pos + i], cap);
 		}
 
-		BUILD_BUG_ON(PCI_CAP_ID_MAX >= PCI_CAP_ID_INVALID_VIRT);
+		BUILD_BUG_ON(PCI_CAP_ID_MAX >= PCI_CAP_ID_FIRST_HIDDEN);
 
 		memset(map + pos, cap, len);
 		ret = vfio_fill_vconfig_bytes(vdev, pos, len);
@@ -1673,9 +1677,9 @@  static int vfio_ecap_init(struct vfio_pci_core_device *vdev)
 		/*
 		 * Even though ecap is 2 bytes, we're currently a long way
 		 * from exceeding 1 byte capabilities.  If we ever make it
-		 * up to 0xFE we'll need to up this to a two-byte, byte map.
+		 * up to 0xFD we'll need to up this to a two-byte, byte map.
 		 */
-		BUILD_BUG_ON(PCI_EXT_CAP_ID_MAX >= PCI_CAP_ID_INVALID_VIRT);
+		BUILD_BUG_ON(PCI_EXT_CAP_ID_MAX >= PCI_CAP_ID_FIRST_HIDDEN);
 
 		memset(map + epos, ecap, len);
 		ret = vfio_fill_vconfig_bytes(vdev, epos, len);
@@ -1688,10 +1692,11 @@  static int vfio_ecap_init(struct vfio_pci_core_device *vdev)
 		 * indicates to use cap id = 0, version = 0, next = 0 if
 		 * ecaps are absent, hope users check all the way to next.
 		 */
-		if (hidden)
+		if (hidden) {
 			*(__le32 *)&vdev->vconfig[epos] &=
 				cpu_to_le32((0xffcU << 20));
-		else
+			memset(map + epos, PCI_CAP_ID_FIRST_HIDDEN, len);
+		} else
 			ecaps++;
 
 		prev = (__le32 *)&vdev->vconfig[epos];
@@ -1895,6 +1900,9 @@  static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user
 	} else if (cap_id == PCI_CAP_ID_INVALID_VIRT) {
 		perm = &virt_perms;
 		cap_start = *ppos;
+	} else if (cap_id == PCI_CAP_ID_FIRST_HIDDEN) {
+		perm = &hidden_ecap_perm;
+		cap_start = PCI_CFG_SPACE_SIZE;
 	} else {
 		if (*ppos >= PCI_CFG_SPACE_SIZE) {
 			WARN_ON(cap_id > PCI_EXT_CAP_ID_MAX);