diff mbox series

xen: add privcmd ioctl to get p2pdma_distance

Message ID 20241207105946.542491-1-julia.zhang@amd.com (mailing list archive)
State New
Headers show
Series xen: add privcmd ioctl to get p2pdma_distance | expand

Commit Message

Zhang, Julia Dec. 7, 2024, 10:59 a.m. UTC
To implement dGPU prime feature, virtgpu driver need to get
p2pdma_distance of two GPU from host side.

This adds a new privcmd ioctl to get the real p2pdma_distance of two pci
devices in the host with pci notations sent from guest side.

Signed-off-by: Julia Zhang <julia.zhang@amd.com>
---
 tools/include/xen-sys/Linux/privcmd.h | 12 ++++++++++++
 tools/include/xenctrl.h               | 17 +++++++++++++++++
 tools/libs/ctrl/xc_freebsd.c          | 11 +++++++++++
 tools/libs/ctrl/xc_linux.c            | 24 ++++++++++++++++++++++++
 tools/libs/ctrl/xc_minios.c           | 11 +++++++++++
 tools/libs/ctrl/xc_netbsd.c           | 11 +++++++++++
 tools/libs/ctrl/xc_physdev.c          | 12 ++++++++++++
 tools/libs/ctrl/xc_solaris.c          | 11 +++++++++++
 8 files changed, 109 insertions(+)

Comments

Jan Beulich Dec. 9, 2024, 7:47 a.m. UTC | #1
On 07.12.2024 11:59, Julia Zhang wrote:
> To implement dGPU prime feature, virtgpu driver need to get
> p2pdma_distance of two GPU from host side.
> 
> This adds a new privcmd ioctl to get the real p2pdma_distance of two pci
> devices in the host with pci notations sent from guest side.
> 
> Signed-off-by: Julia Zhang <julia.zhang@amd.com>

First - correcting Anthony's email address. He's no longer at Citrix/Cloud.

Second, please send patches To: the list, with maintainers on Cc:. (Stefano,
as this isn't the first such issue, can you please try to spread the
knowledge of this across people starting to contribute?) Personally I'd
question the length of the Cc: list of this submission, though. Plus - along
with Roger I was on the To: list here despite not even being maintainer of
any of the files touched.

> --- a/tools/include/xen-sys/Linux/privcmd.h
> +++ b/tools/include/xen-sys/Linux/privcmd.h
> @@ -110,6 +110,16 @@ typedef struct privcmd_map_hva_to_gpfns {
>  	int add_mapping;
>  } privcmd_map_hva_to_gpfns_t;
>  
> +typedef struct privcmd_p2pdma_distance {
> +	__u32 provider_bus;
> +	__u32 provider_slot;
> +	__u32 provider_func;
> +	__u32 client_bus;
> +	__u32 client_slot;
> +	__u32 client_func;
> +	__u32 distance;
> +} privcmd_p2pdma_distance_t;

"Distance" typically is a symmetric thing. Why the asymmetry here? And
why __u32 when __u8 will be fine for most fields? And where's the segment
part of the device coordinates? Finally, with it being merely stub
implementations that you add here, all details on where the needed info
is to come from are missing.

Jan
Zhang, Julia Dec. 16, 2024, 8:18 a.m. UTC | #2
On 2024/12/9 15:47, Jan Beulich wrote:

On 07.12.2024 11:59, Julia Zhang wrote:

To implement dGPU prime feature, virtgpu driver need to get

p2pdma_distance of two GPU from host side.



This adds a new privcmd ioctl to get the real p2pdma_distance of two pci

devices in the host with pci notations sent from guest side.



Signed-off-by: Julia Zhang <julia.zhang@amd.com><mailto:julia.zhang@amd.com>



First - correcting Anthony's email address. He's no longer at Citrix/Cloud.
Thank you for reminding me and correcting it.





Second, please send patches To: the list, with maintainers on Cc:. (Stefano,

as this isn't the first such issue, can you please try to spread the

knowledge of this across people starting to contribute?) Personally I'd

question the length of the Cc: list of this submission, though. Plus - along

with Roger I was on the To: list here despite not even being maintainer of

any of the files touched.
Sorry for bothering you and I will modify the lists next time. Thanks for reviewing this patch.





--- a/tools/include/xen-sys/Linux/privcmd.h

+++ b/tools/include/xen-sys/Linux/privcmd.h

@@ -110,6 +110,16 @@ typedef struct privcmd_map_hva_to_gpfns {

   int add_mapping;

 } privcmd_map_hva_to_gpfns_t;



+typedef struct privcmd_p2pdma_distance {

+  __u32 provider_bus;

+  __u32 provider_slot;

+  __u32 provider_func;

+  __u32 client_bus;

+  __u32 client_slot;

+  __u32 client_func;

+  __u32 distance;

+} privcmd_p2pdma_distance_t;



"Distance" typically is a symmetric thing. Why the asymmetry here? And

why __u32 when __u8 will be fine for most fields? And where's the segment

part of the device coordinates? Finally, with it being merely stub

implementations that you add here, all details on where the needed info

is to come from are missing.

"Distance" is p2pdma-distance between two PCI devices, it's calculated in kernel driver.I don't get why it's symmetric?

I will use change __u32 to __u8.

By the segment part of the device coordinates, do you mean the domain number of the device?

All the needed info(virtual bus/slot/function numbers) are from guest VM, QEMU convert them to real physical info of two devices.

Do you mean that I should add more details in commit message or comment?

Julia





Jan
Jan Beulich Dec. 16, 2024, 9:19 a.m. UTC | #3
On 16.12.2024 09:18, Zhang, Julia wrote:
> On 2024/12/9 15:47, Jan Beulich wrote:
> On 07.12.2024 11:59, Julia Zhang wrote:

Yet another formality, sorry: Please send plain text emails. You'll note that what
I said and why you said is indistinguishably intermixed below.

> --- a/tools/include/xen-sys/Linux/privcmd.h
> 
> +++ b/tools/include/xen-sys/Linux/privcmd.h
> 
> @@ -110,6 +110,16 @@ typedef struct privcmd_map_hva_to_gpfns {
> 
>    int add_mapping;
> 
>  } privcmd_map_hva_to_gpfns_t;
> 
> 
> 
> +typedef struct privcmd_p2pdma_distance {
> 
> +  __u32 provider_bus;
> 
> +  __u32 provider_slot;
> 
> +  __u32 provider_func;
> 
> +  __u32 client_bus;
> 
> +  __u32 client_slot;
> 
> +  __u32 client_func;
> 
> +  __u32 distance;
> 
> +} privcmd_p2pdma_distance_t;
> 
> 
> 
> "Distance" typically is a symmetric thing. Why the asymmetry here? And
> 
> why __u32 when __u8 will be fine for most fields? And where's the segment
> 
> part of the device coordinates? Finally, with it being merely stub
> 
> implementations that you add here, all details on where the needed info
> 
> is to come from are missing.
> 
> "Distance" is p2pdma-distance between two PCI devices, it's calculated in kernel driver.I don't get why it's symmetric?

Distance from A to B is usually the same as that from B to A. But yes,
not necessarily always (thinking of e.g. rings). Yet still I'm unclear
about the distinction between provide and client.

> I will use change __u32 to __u8.
> 
> By the segment part of the device coordinates, do you mean the domain number of the device?

Some call it domain, yes. Since domain has an important-ish different
meaning in Xen, we prefer segment though.

> All the needed info(virtual bus/slot/function numbers) are from guest VM, QEMU convert them to real physical info of two devices.
> 
> Do you mean that I should add more details in commit message or comment?

Perhaps. Or have the patch be more complete, in the sense of not only
putting in place stubs, thus actually making visible how the data is
produced / used.

Jan
Zhang, Julia Dec. 17, 2024, 5:53 a.m. UTC | #4
On 2024/12/16 17:19, Jan Beulich wrote:
> On 16.12.2024 09:18, Zhang, Julia wrote:
>> On 2024/12/9 15:47, Jan Beulich wrote:
>> On 07.12.2024 11:59, Julia Zhang wrote:
> 
> Yet another formality, sorry: Please send plain text emails. You'll note that what
> I said and why you said is indistinguishably intermixed below.

Thanks for reminding.

> 
>> --- a/tools/include/xen-sys/Linux/privcmd.h
>>
>> +++ b/tools/include/xen-sys/Linux/privcmd.h
>>
>> @@ -110,6 +110,16 @@ typedef struct privcmd_map_hva_to_gpfns {
>>
>>     int add_mapping;
>>
>>   } privcmd_map_hva_to_gpfns_t;
>>
>>
>>
>> +typedef struct privcmd_p2pdma_distance {
>>
>> +  __u32 provider_bus;
>>
>> +  __u32 provider_slot;
>>
>> +  __u32 provider_func;
>>
>> +  __u32 client_bus;
>>
>> +  __u32 client_slot;
>>
>> +  __u32 client_func;
>>
>> +  __u32 distance;
>>
>> +} privcmd_p2pdma_distance_t;
>>
>>
>>
>> "Distance" typically is a symmetric thing. Why the asymmetry here? And
>>
>> why __u32 when __u8 will be fine for most fields? And where's the segment
>>
>> part of the device coordinates? Finally, with it being merely stub
>>
>> implementations that you add here, all details on where the needed info
>>
>> is to come from are missing.
>>
>> "Distance" is p2pdma-distance between two PCI devices, it's calculated in kernel driver.I don't get why it's symmetric?
> 
> Distance from A to B is usually the same as that from B to A. But yes,
> not necessarily always (thinking of e.g. rings). Yet still I'm unclear
> about the distinction between provide and client.

Provider - A driver which provides or publishes P2P resources.
Client - A driver which makes use of a resource.

In our case, we want to use passthrough dGPU render data, and virtio 
iGPU display data. So dGPU need to import display buffer of iGPU. iGPU 
is provider and dGPU is client.

> 
>> I will use change __u32 to __u8.
>>
>> By the segment part of the device coordinates, do you mean the domain number of the device?
> 
> Some call it domain, yes. Since domain has an important-ish different
> meaning in Xen, we prefer segment though.
> 

I see, I will add this info in the struct.

>> All the needed info(virtual bus/slot/function numbers) are from guest VM, QEMU convert them to real physical info of two devices.
>>
>> Do you mean that I should add more details in commit message or comment?
> 
> Perhaps. Or have the patch be more complete, in the sense of not only
> putting in place stubs, thus actually making visible how the data is
> produced / used.
> 

Sure, I will try to add more details.

Julia

> Jan
Jan Beulich Dec. 17, 2024, 8:53 a.m. UTC | #5
On 17.12.2024 06:53, Zhang, Julia wrote:
> 
> 
> On 2024/12/16 17:19, Jan Beulich wrote:
>> On 16.12.2024 09:18, Zhang, Julia wrote:
>>> On 2024/12/9 15:47, Jan Beulich wrote:
>>> On 07.12.2024 11:59, Julia Zhang wrote:
>>
>> Yet another formality, sorry: Please send plain text emails. You'll note that what
>> I said and why you said is indistinguishably intermixed below.
> 
> Thanks for reminding.
> 
>>
>>> --- a/tools/include/xen-sys/Linux/privcmd.h
>>>
>>> +++ b/tools/include/xen-sys/Linux/privcmd.h
>>>
>>> @@ -110,6 +110,16 @@ typedef struct privcmd_map_hva_to_gpfns {
>>>
>>>     int add_mapping;
>>>
>>>   } privcmd_map_hva_to_gpfns_t;
>>>
>>>
>>>
>>> +typedef struct privcmd_p2pdma_distance {
>>>
>>> +  __u32 provider_bus;
>>>
>>> +  __u32 provider_slot;
>>>
>>> +  __u32 provider_func;
>>>
>>> +  __u32 client_bus;
>>>
>>> +  __u32 client_slot;
>>>
>>> +  __u32 client_func;
>>>
>>> +  __u32 distance;
>>>
>>> +} privcmd_p2pdma_distance_t;
>>>
>>>
>>>
>>> "Distance" typically is a symmetric thing. Why the asymmetry here? And
>>>
>>> why __u32 when __u8 will be fine for most fields? And where's the segment
>>>
>>> part of the device coordinates? Finally, with it being merely stub
>>>
>>> implementations that you add here, all details on where the needed info
>>>
>>> is to come from are missing.
>>>
>>> "Distance" is p2pdma-distance between two PCI devices, it's calculated in kernel driver.I don't get why it's symmetric?
>>
>> Distance from A to B is usually the same as that from B to A. But yes,
>> not necessarily always (thinking of e.g. rings). Yet still I'm unclear
>> about the distinction between provide and client.
> 
> Provider - A driver which provides or publishes P2P resources.
> Client - A driver which makes use of a resource.
> 
> In our case, we want to use passthrough dGPU render data, and virtio 
> iGPU display data. So dGPU need to import display buffer of iGPU. iGPU 
> is provider and dGPU is client.

Right, but: Is this arrangement relevant for the new ioctl? Aren't
you simply after the distance between two devices, of which your
provider/client model is merely a special case?

Jan
diff mbox series

Patch

diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h
index 5071ebcc8b..fbdc9aa927 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -110,6 +110,16 @@  typedef struct privcmd_map_hva_to_gpfns {
 	int add_mapping;
 } privcmd_map_hva_to_gpfns_t;
 
+typedef struct privcmd_p2pdma_distance {
+	__u32 provider_bus;
+	__u32 provider_slot;
+	__u32 provider_func;
+	__u32 client_bus;
+	__u32 client_slot;
+	__u32 client_func;
+	__u32 distance;
+} privcmd_p2pdma_distance_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -131,6 +141,8 @@  typedef struct privcmd_map_hva_to_gpfns {
 	_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
 #define IOCTL_PRIVCMD_PCIDEV_GET_GSI			\
 	_IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_pcidev_get_gsi_t))
+#define IOCTL_PRIVCMD_P2PDMA_DISTANCE                          \
+	_IOC(_IOC_NONE, 'P', 11, sizeof(privcmd_p2pdma_distance_t))
 #define IOCTL_PRIVCMD_MAP_HVA_TO_GPFNS                         \
 	_IOC(_IOC_NONE, 'P', 13, sizeof(privcmd_map_hva_to_gpfns_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED				\
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 29617585c5..42d15a22b8 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1654,6 +1654,23 @@  int xc_physdev_unmap_pirq(xc_interface *xch,
 
 int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf);
 
+int xc_physdev_p2pdma_distance(xc_interface *xch,
+                               uint32_t bus,
+                               uint32_t slot,
+                               uint32_t func,
+                               uint32_t c_bus,
+                               uint32_t c_slot,
+                               uint32_t c_func);
+
+int xc_pcidev_p2pdma_distance(xc_interface *xch,
+                              uint32_t bus,
+                              uint32_t slot,
+                              uint32_t func,
+                              uint32_t c_bus,
+                              uint32_t c_slot,
+                              uint32_t c_func);
+
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/ctrl/xc_freebsd.c b/tools/libs/ctrl/xc_freebsd.c
index 9019fc6633..4e0df06b7c 100644
--- a/tools/libs/ctrl/xc_freebsd.c
+++ b/tools/libs/ctrl/xc_freebsd.c
@@ -60,6 +60,17 @@  void *xc_memalign(xc_interface *xch, size_t alignment, size_t size)
     return ptr;
 }
 
+int xc_pcidev_p2pdma_distance(xc_interface *xch,
+                              uint32_t bus,
+                              uint32_t slot,
+                              uint32_t func,
+                              uint32_t c_bus,
+                              uint32_t c_slot,
+                              uint32_t c_func)
+{
+    return -1;
+}
+
 int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
 {
     errno = ENOSYS;
diff --git a/tools/libs/ctrl/xc_linux.c b/tools/libs/ctrl/xc_linux.c
index 92591e49a1..9aeff2328f 100644
--- a/tools/libs/ctrl/xc_linux.c
+++ b/tools/libs/ctrl/xc_linux.c
@@ -66,6 +66,30 @@  void *xc_memalign(xc_interface *xch, size_t alignment, size_t size)
     return ptr;
 }
 
+int xc_pcidev_p2pdma_distance(xc_interface *xch,
+                              uint32_t bus,
+                              uint32_t slot,
+                              uint32_t func,
+                              uint32_t c_bus,
+                              uint32_t c_slot,
+                              uint32_t c_func)
+{
+    privcmd_p2pdma_distance_t p2pdma_distance = {
+        .provider_bus = bus,
+        .provider_slot = slot,
+        .provider_func = func,
+        .client_bus = c_bus,
+        .client_slot = c_slot,
+        .client_func = c_func,
+        .distance = -1,
+    };
+    if (!ioctl(xencall_fd(xch->xcall), IOCTL_PRIVCMD_P2PDMA_DISTANCE, &p2pdma_distance)) {
+	return p2pdma_distance.distance;
+    }
+
+    return -1;
+}
+
 int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
 {
     int ret;
diff --git a/tools/libs/ctrl/xc_minios.c b/tools/libs/ctrl/xc_minios.c
index 462af827b3..4698cca5ea 100644
--- a/tools/libs/ctrl/xc_minios.c
+++ b/tools/libs/ctrl/xc_minios.c
@@ -47,6 +47,17 @@  void *xc_memalign(xc_interface *xch, size_t alignment, size_t size)
     return memalign(alignment, size);
 }
 
+int  xc_pcidev_p2pdma_distance(xc_interface *xch,
+                               uint32_t bus,
+                               uint32_t slot,
+                               uint32_t func,
+                               uint32_t c_bus,
+                               uint32_t c_slot,
+                               uint32_t c_func)
+{
+    return -1;
+}
+
 int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
 {
     errno = ENOSYS;
diff --git a/tools/libs/ctrl/xc_netbsd.c b/tools/libs/ctrl/xc_netbsd.c
index 1318d4d906..7e0ee8417e 100644
--- a/tools/libs/ctrl/xc_netbsd.c
+++ b/tools/libs/ctrl/xc_netbsd.c
@@ -63,6 +63,17 @@  void *xc_memalign(xc_interface *xch, size_t alignment, size_t size)
     return valloc(size);
 }
 
+int xc_pcidev_p2pdma_distance(xc_interface *xch,
+                              uint32_t bus,
+                              uint32_t slot,
+                              uint32_t func,
+                              uint32_t c_bus,
+                              uint32_t c_slot,
+                              uint32_t c_func)
+{
+    return -1;
+}
+
 int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
 {
     errno = ENOSYS;
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 25e686d7b3..4ee8f39e09 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -138,3 +138,15 @@  int xc_physdev_unmap_pirq(xc_interface *xch,
     return rc;
 }
 
+int xc_physdev_p2pdma_distance(xc_interface *xch,
+                               uint32_t bus,
+                               uint32_t slot,
+                               uint32_t func,
+                               uint32_t c_bus,
+                               uint32_t c_slot,
+                               uint32_t c_func)
+{
+    return xc_pcidev_p2pdma_distance(xch, bus, slot, func,
+                                     c_bus, c_slot, c_func);
+}
+
diff --git a/tools/libs/ctrl/xc_solaris.c b/tools/libs/ctrl/xc_solaris.c
index 049e28d55c..a054b40b31 100644
--- a/tools/libs/ctrl/xc_solaris.c
+++ b/tools/libs/ctrl/xc_solaris.c
@@ -32,6 +32,17 @@  void *xc_memalign(xc_interface *xch, size_t alignment, size_t size)
     return memalign(alignment, size);
 }
 
+int xc_pcidev_p2pdma_distance(xc_interface *xch,
+                              uint32_t bus,
+                              uint32_t slot,
+                              uint32_t func,
+                              uint32_t c_bus,
+                              uint32_t c_slot,
+                              uint32_t c_func)
+{
+    return -1;
+}
+
 int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
 {
     errno = ENOSYS;